Category Archives: Huawei

Huawei FusionServer RH2288 V3 : The not so ready, “vSAN READY NODE”

On the face of it,  this heading may seem as if I’m taking a shot at Huawei. I want to make it clear that in no way is the post aimed to bad mouth Huawei or any vendor, it is merely a post to assist anyone else in the community that  may face a similar problem I had with my vSAN deployment on this specific hardware platform.

This was my first VMware vSAN deployment and the “nerd” in me was super exited.  I knew that I would be learning a “whack load” of new cool things.  Fortunately I’ve read “The essential vSAN guide” by Duncan Epping and Cormac Hogan while I was studying for my VCAP6-DCD exam, so I had good technical understanding on how vSAN works and how it should be deployed.

As with many new vSAN deployments, a lot of research goes in to finding the correct hardware. VMware have “pre-certified” certain vendor hardware, to make our jobs easier. VMware refers to this as “vSAN Ready-Nodes”, you can get more info here.

The hardware vendor we used was Huawei and the the model is “RH2288H v3”. We opted for the All-Flash configuration.  The most important component with any vendor choice, is the storage controller. Because this server was certified by VMware as a ready node, I expected no installation problems. Don’t think I was prepared for two days of trouble shooting. Anyway I’ll get to that later….

Before you begin installing anything, there are a couple of things you’ll need to change in the bios. All Huawei server’s bios is protected with a default password “Huawei12#$”. This can be disabled or changed within the bios setup utility. If you are going to be enabling “EVC” on your cluster, you will need to enable “Monitor/MWait”

huawei_bios

Next thing we need to do is configure that RAID controller for “Pass-through”. The raid controller that ships with the RH2288H v3 is Huawei’s own “SR 430C / RU 430C”.  This is based on the LSI3108 chip set. Our vSAN nodes required two disk groups and for that reason we also got an additional PCIe RAID controller for the second disk group. The process get into the RAID card bios is the standard “CRTL-R” during raid card initialisation. Once in, select your RAID controller and enable JBOD. In my case I had to do this for both controllers. Save your config and restart.

LSI3108
I’m not going to post about how to install ESXi on this node. The process is similar as my post on Installing ESXi on Huawei : RH5885H V3 (FusionServer).  The one exception is an updated driver that you can obtain from Huawei’s support site here. Download the latest version available. We’ll need this later in the post.

Now that you have ESXi installed you’ll need to do all the basic configurations and get it connected to your VCSA. In my deployment I made use of LAG’s and therefore required an additional configuration to get this up. This is not a requirement for vSAN, you can use a standard virtual switch.
Once you have you port groups set and  configured, you’ll need to configure the vmkernel ports for vSAN. You have the option to create a new TCP/IP stack for vSAN or you can use the “Default Stack”. The latter option is mainly used for when you have a stretched vSAN cluster and require routing across you vSAN vmkernel ports. In my design this was not a requirement.

vmk_create_vsan

The next step would be to enable vSAN on the cluster level. It’s at this point where the I discovered the “vSAN Ready Node” was far from ready for anything.
If you enable vSAN as this point, you will notice that the disk group creation gets stuck on 21% and the host will become unresponsive.

You will see errors similar to this in the vmkernel.log:VSAN error

This is a known issue and VMware has a KB for this. Although it’s for a Dell PERC controller, further research has also linked it to the Huawei’s “SR 430C / RU 430C”.

To resolve the issue, I restarted the host and reconfigured the RAID controllers to remove the disk from the host. This can be done by entering the RAID controller configuration utility and disabling  “JBOD”

The host will start up and connect back to vCenter. Now were going to follow the KB article related to the problem. Enable SSH on the host. Connect to it and execute :

esxcli system module set –enabled=false –module=lsi_mr3

This will force ESXi to use the correct driver which is megaraid_sas.
Restart the host.

I decided to go over Huawei’s documentation to see if there was any known issue related to this and discoverer that they have a separate utility called an “iDriver”. This tool will actually check all your firmware and driver versions on the host and update them if needed. You can find the tools here.

Extract the tool and copy it to your host using scp. Open another ssh session to you host and execute install_driver.sh. 

You will be prompted with a iDriver installer menu.

idriver_menu

From here you can either choose to automatically install the required driver and firmware versions or you can validate the current info. I’m going to select option two but you would select option one if this was a new installation. You can the reboot and use option two to validate the firmware and driver status. You should get an output such as below:

Screenshot from 2017-02-22 14-00-59

After this you can reboot your ESXi node and reconfigure the RAID controller.

You should be able to enable and configure vSAN.

 

Installing ESXi on Huawei : RH5885H V3 (FusionServer)

huawei_logo

Writing a blog post on a Huawei server seems strange but with Huawei growing in popularity, especially in the South African market it’s probably overdue.

In this post I will be covering my experience with my first Huawei server and detailing some issues I experienced during my deployment.

I’ve had the opportunity to work with the FusionServer (RH5885H v3). Now for those who do not know what “FusionServer” is, it’s basically Huawei punting virtualization on this specific hardware, whether it be VMware,Hyper-V or other popular hypervisors. They probably wont admit to it, but I’m sure that they would prefer you to run their virtualization software “Fusion Sphere”, anyway I will leave it at that. If you would like to know more about FusionSphere refer to their site here.

This is the maximum configurable specifications for this server.

rh5885h_specs

As you can see it’s able to take a good amount of resources. The spec that I used was 4 sockets 15 cores and 2TB of memory.

Installing ESXi.

According to Huawei, you will need to build your own ESXi image and inject all their drivers into the image. I found this to be a bit of a pain as most vendors I’ve worked with do this for all supported versions of ESXi. I’m hoping that I can convince Huawei to start doing this.

I used Huawei’s official OS installation guide to build a customized ESXi image. The official OS installation guide can be found here.

You will need four components to build the customized ESXi image.

  • ESXi-Customizer – Available here.
    Andreas Peetz has done some awesome work on this tool. I also recommend you use the new PowerCli version and not the old version referenced on the Huawei documentation.
  • The correct ESXi version you would like to install.
    Available from on VMware’s site. You need to use the offline .zip bundle.
  • The Huawei driver installation package.
    Available from Huawei’s site, follow the Huawei OS Install guide.
  • PowerCli 5.x or later.
    Also available from VMware’s site.

You can follow this video for detailed “how to” guide to create your customized image.

 

Preparing the Hardware

Before you start installing  ESXi from your newly created media, you need to make some changes to the servers BIOS and more importantly, update the BIOS firmware. I found that the BIOS was missing a crucial feature that would allow it to be added to an “EVC” enabled cluster.

To update the firmware on your BIOS speak to your Huawei representative to assist. You need to ensure that you are not on a firmware version older than  (U117) V322 .

huawei_bios_fw

Next, you need to configure the default options to enable virtualization on the BIOS as you would any other server. The one thing on this BIOS you need to make sure is enabled is this :

huawei_biosNOTE: Huawei thought it would be a good idea to have a default password on the BIOS which is “Huawei123$”

 

This is the reason we had to update the BIOS. This option is not available on the older firmware version and as I mentioned before, you wont be able to add this host to an “EVC” enabled cluster without this option “enabled”

Once you have completed all these steps you can start installing your server as normal.