VCSA – PSC Services failing to start

Recently I came across a strange issue when one of my PSC’s in a distributed setup decided to disable all of the required PSC services from staring up.

The issue started when I had an F5 load balancer that failed, whether this was the actual cause to the PSC disabling the services…. it’s highly unlikely.

The fact that the services were set to be disabled from starting up was not apparent at first. I only noticed it when trying to stop and start the services.

When starting up the services you would see something like this :

service-control –start –all INFO:root:Service: vmafdd, Action: start Service: vmafdd, Action: start 2017-04-18T12:07:49.725Z Running command: [‘/sbin/chkconfig’, u’vmafdd’] 2017-04-18T12:07:49.801Z Done running command

‘vmafdd’ startMode is Manual, skipping to start:

INFO:root:’vmafdd’ startMode is Manual, skipping to start: INFO:root:Service: vmware-rhttpproxy, Action: start Service: vmware-rhttpproxy, Action: start 2017-04-18T12:07:49.803Z Running command: [‘/sbin/chkconfig’, u’vmware-rhttpproxy’] 2017-04-18T12:07:49.878Z Done running command

‘vmware-rhttpproxy’ startMode is Manual, skipping to start:

INFO:root:’vmware-rhttpproxy’ startMode is Manual, skipping to start:
INFO:root:Service: vmdird, Action: start

You’ll notice this across all the services.  When checking the services status “service-control –status –all”,  you’ll notice that the services actually don’t start, obviously it wont as it is set to be disabled.

This will list all the services and their startup configuration status.
In my case all the required services were set to off.

To resolve the issue I had to set all the services on  again. This can be done by using chkconfig as well.

Note that this has to be done for every individual service and also that some of these services rely on each other to be configure “on” as well. You will be prompted to enable the service dependency where required for example:

“chkconfig -A”In this example, enable the service dependency first.

Once complete, confirm that all the services are enabled by using chkconfig -A

You should be able to start all the services now.
“service-control –start –all”


PowerCLI – Perennial-Reservation


Are you hosting any MSCS workloads on ESXi ?
You probably forgot to set a perennial reservation.  I’ve used RDM’s in small environments and never really noticed any issues with slow ESXi node startups…Until recently.

The infrastructure I currently look after has a 3 node MSQL cluster with more than 25 RDM’s presented to them. A few days ago I had to add  another two ESXi nodes to the cluster and noticed that the nodes took longer than expected to start up.   This issues was related to KB1016106.

Due to the amount of RDM’s presented to the ESXi cluster, I would have to automate the process to perennially reserve the RDM’s across all the nodes.

Step 1

Determine what disks are actually RDM’s and get the naa.ID.

# Import VIM automation core snapin if not loaded
$imported = Get-PSSnapin -Name VMware.VimAutomation.Core -ErrorAction 'SilentlyContinue'
if (!$imported)
 Write-Host -ForegroundColor "yellow" "Importing VMware.VimAutomation.Core..."
 Add-PSSnapin VMware.VimAutomation.Core -ErrorAction 'Stop'
 Write-Host "Something went wrong. ""VMware.VimAutomation.Core"" snapin not found. Try execute the script from PowerCli"
$vchost = Read-Host "Please enter your VIserver FQDN/IP"

 try {
 Connect-VIServer $vchost -ErrorAction:Stop
 Write-Host -ForegroundColor Green "Connected to VI server"

 } catch {
 Write-Host -ForegroundColor Red "Unable to connect to the VI server"
$vm = Read-Host "Please enter the VM name"
$rdms = get-vm $vm | get-harddisk | Where-Object {$_.DiskType -eq "RawPhysical"}
foreach ($rdm in $rdms){
$observed = New-Object -TypeName PSObject
$observed | Add-Member -MemberType NoteProperty -Name Parent -Value $rdm.Parent
$observed | Add-Member -MemberType NoteProperty -Name Name -Value $
$observed | Add-Member -MemberType NoteProperty -Name ScsiCanonicalName -Value $rdm.ScsiCanonicalName
Write-Output $observed

Disconnect-VIServer $vchost -Confirm:$false

Step 2

Using the output obtained from step 1, execute this script against your individual ESXi nodes. You will need to use the “Scsi_CNAME” to reserve the devices on your ESXI node.  Create a new file containing each naa.ID similar to this :


Now you can start setting the reservation on your ESXi node with this script.

Add-PSSnapin VMware.VimAutomation.Core
 try {
 Get-PSSnapin -Name "VMware.VimAutomation.Core" -ErrorAction:Stop
 Write-Host -ForegroundColor Green "Loaded ""VMware.VimAutomation.Core"" PSSnapin"
 } catch {
 Write-Host -ForegroundColor Red "Unable to load the required PS module ""VMware.VimAutomation.Core"""

$path_to_csv = Read-Host "Your path to the .csv input file"
$vmhost = Read-Host "Please supply your ESXi nodes FQDN"
$esxi_admin = Read-Host "Please specify user with root access"
$esxi_admin_pass = Read-Host "Please specify password"
$naas = Import-Csv $path_to_csv

foreach ($naa in $naas){
Connect-VIServer $VMHost -User $esxi_admin -Password $esxi_admin_pass 

 $myesxcli= get-esxcli $VMHost
 $rdmnaa = $naa.ScsiCanonicalName
 $$false, "$rdmnaa", $true)

Disconnect-VIServer $VMHost -Confirm:$false 

Step 3

Verify that your reservation have been set. Grab one or two random naa.ID’s and check.


Thats it, your done.


On the face of it,  this heading may seem as if I’m taking a shot at Huawei. I want to make it clear that in no way is the post aimed to bad mouth Huawei or any vendor, it is merely a post to assist anyone else in the community that  may face a similar problem I had with my vSAN deployment on this specific hardware platform.

This was my first VMware vSAN deployment and the “nerd” in me was super exited.  I knew that I would be learning a “whack load” of new cool things.  Fortunately I’ve read “The essential vSAN guide” by Duncan Epping and Cormac Hogan while I was studying for my VCAP6-DCD exam, so I had good technical understanding on how vSAN works and how it should be deployed.

As with many new vSAN deployments, a lot of research goes in to finding the correct hardware. VMware have “pre-certified” certain vendor hardware, to make our jobs easier. VMware refers to this as “vSAN Ready-Nodes”, you can get more info here.

The hardware vendor we used was Huawei and the the model is “RH2288H v3”. We opted for the All-Flash configuration.  The most important component with any vendor choice, is the storage controller. Because this server was certified by VMware as a ready node, I expected no installation problems. Don’t think I was prepared for two days of trouble shooting. Anyway I’ll get to that later….

Before you begin installing anything, there are a couple of things you’ll need to change in the bios. All Huawei server’s bios is protected with a default password “Huawei12#$”. This can be disabled or changed within the bios setup utility. If you are going to be enabling “EVC” on your cluster, you will need to enable “Monitor/MWait”


Next thing we need to do is configure that RAID controller for “Pass-through”. The raid controller that ships with the RH2288H v3 is Huawei’s own “SR 430C / RU 430C”.  This is based on the LSI3108 chip set. Our vSAN nodes required two disk groups and for that reason we also got an additional PCIe RAID controller for the second disk group. The process get into the RAID card bios is the standard “CRTL-R” during raid card initialisation. Once in, select your RAID controller and enable JBOD. In my case I had to do this for both controllers. Save your config and restart.

I’m not going to post about how to install ESXi on this node. The process is similar as my post on Installing ESXi on Huawei : RH5885H V3 (FusionServer).  The one exception is an updated driver that you can obtain from Huawei’s support site here. Download the latest version available. We’ll need this later in the post.

Now that you have ESXi installed you’ll need to do all the basic configurations and get it connected to your VCSA. In my deployment I made use of LAG’s and therefore required an additional configuration to get this up. This is not a requirement for vSAN, you can use a standard virtual switch.
Once you have you port groups set and  configured, you’ll need to configure the vmkernel ports for vSAN. You have the option to create a new TCP/IP stack for vSAN or you can use the “Default Stack”. The latter option is mainly used for when you have a stretched vSAN cluster and require routing across you vSAN vmkernel ports. In my design this was not a requirement.


The next step would be to enable vSAN on the cluster level. It’s at this point where the I discovered the “vSAN Ready Node” was far from ready for anything.
If you enable vSAN as this point, you will notice that the disk group creation gets stuck on 21% and the host will become unresponsive.

You will see errors similar to this in the vmkernel.log:vsan-error-7489020

This is a known issue and VMware has a KB for this. Although it’s for a Dell PERC controller, further research has also linked it to the Huawei’s “SR 430C / RU 430C”.

To resolve the issue, I restarted the host and reconfigured the RAID controllers to remove the disk from the host. This can be done by entering the RAID controller configuration utility and disabling  “JBOD”

The host will start up and connect back to vCenter. Now were going to follow the KB article related to the problem. Enable SSH on the host. Connect to it and execute :

esxcli system module set –enabled=false –module=lsi_mr3

This will force ESXi to use the correct driver which is megaraid_sas.
Restart the host.

I decided to go over Huawei’s documentation to see if there was any known issue related to this and discoverer that they have a separate utility called an “iDriver”. This tool will actually check all your firmware and driver versions on the host and update them if needed. You can find the tools here.

Extract the tool and copy it to your host using scp. Open another ssh session to you host and execute 

You will be prompted with a iDriver installer menu.


From here you can either choose to automatically install the required driver and firmware versions or you can validate the current info. I’m going to select option two but you would select option one if this was a new installation. You can the reboot and use option two to validate the firmware and driver status. You should get an output such as below:


After this you can reboot your ESXi node and reconfigure the RAID controller.

You should be able to enable and configure vSAN.

PowerCli – Configure Guest OS IP

Ever wondered how you could configure IP’s on a couple of newly created virtual machines using PowerCli. I created a quick script to configure IP addresses on some RHEL 6 virtual machines.
This script has some specific actions that had to be performed to set the hostname. This was done using “SED”.

You could edit it to do other OS’s as well. Just note that this is not supported on Server 2012.  I’ll create a new post on how to do this for Server 2012.

Here are some prerequisites before you can actually execute this script:

  • VMTools needs to be installed and running.
  • Root/Admin credentails.
  • A .csv file containing the info you would like to inject in to the virtual machines. (name,ip, sub,gw,dns)

#Import a csv containing all the info.
$vms = Import-Csv "c:UsersUserDocumentsScriptcloud_vms.csv"

#Start prcoessing each VM per line item.

foreach ($vm in $vms){

$script_block_eth0 = "sed 's/ONBOOT=no/ONBOOT=yes/' -i /etc/sysconfig/network-scripts/ifcfg-eth0"
Invoke-VMScript -ScriptText $script_block_eth0 -VM $ -GuestUser root -GuestPassword "P@ssw0rd" -ToolsWaitSecs 20
Get-VMGuestNetworkInterface -VM $ -Name "eth0" -GuestUser root -GuestPassword "P@ssw0rd" | Set-VMGuestNetworkInterface -Ip $vm.ip -Netmask $vm.mask -Gateway $vm.gateway -Dns $vm.dns1,$vm.dns2 -GuestUser root -GuestPassword "P@ssw0rd" -ToolsWaitSecs 30 -Confirm:$false

$vmname = $
$script_block_hostname = "sed 's/HOSTNAME=template/HOSTNAME=$vmname/' -i /etc/sysconfig/network"
Invoke-VMScript -ScriptText $script_block_hostname -VM $ -GuestUser root -GuestPassword "P@ssw0rd" -ToolsWaitSecs 20

Invoke-VMScript -ScriptText "service network restart" -VM $ -GuestUser root -GuestPassword "P@ssw0rd" -ToolsWaitSecs 20

PowerCLI – Migrating DRS rules between vCenter Servers

I haven’t written a blog post in forever, been busy with a big vSphere 6.0 distributed deployment.

Part of the vSphere 6 deployment i had to migrate DRS rules from a windows based vCenter deployment to a VCSA based deployment and copying all your DRS rules across can be a tedious process. I had a plethora of DRS rules to migrate across multiple clusters and was not looking forward to creating all the rules manually. Luckily  Luc Dekens and Matt Boren put together a DRSRule module which made the task easy. More info about the module can be found here.

I have put together a PowerShell script that will migrate all the rules for you.

Note : Updated to Import-Module and not PSSnapin for PowerCli 6.5.x


Param (
if (!$my_old_vc -or (!$the_cluster) -or (!$my_new_vc))
        Throw "You need to supply all parameters."

# Import VIM automation core snapin if not loaded

$imported = Get-Module -Name VMware.VimAutomation.Core -ErrorAction 'SilentlyContinue'

if (!$imported)
        Write-Host -ForegroundColor "yellow" "Importing VMware.VimAutomation.Core..."
                Import-Module VMware.VimAutomation.Core -ErrorAction 'Stop'
                Write-Host "Something went wrong. ""VMware.VimAutomation.Core"" snapin not found. Try execute the script from PowerCli"
# Import DRS Module

        Import-Module DRSRule -ErrorAction 'Stop'
        write-host "DRSRule Module not found. Did you install it correctly ?. Check"


Write-Host "Welcome to the DRSRule migration tool."
Write-Host "You will need to have full administrative access to both vCenter management serevers to proceed."
Write-Host "============================================="
Write-Host "The info you supplied:"
Write-Host " Old_VC  : $my_old_vc"                      
Write-Host " New_VC  : $my_new_vc"
Write-Host " Cluster : $the_cluster"
Write-Host "============================================="

Read-Host "Hit Enter to continue."

        Connect-VIServer $my_old_vc -WarningAction 'SilentlyContinue' -ErrorAction 'Stop'
        write-host "Unable to connect to the old vCenter Management Server $my_old_vc"

write-host "Connected to $my_old_vc."

write-host "Collecting DRS Information. Please Wait."

$drshostgrps = get-cluster -name $the_cluster| get-DRSVMHostGroup 
$drsvmgrps = get-cluster -name $the_cluster |Get-DRSVmGroup
$drs_vm2hosts = get-cluster -name $the_cluster |Get-DrsVMToVMHostRule
$drs_vm2vms = get-cluster -name $the_cluster |Get-DrsVMToVMRule

write-host "Done."

write-host "Disconnect from $my_old_vc"

Disconnect-VIServer $my_old_vc -Confirm:$false

        Connect-VIServer $my_new_vc -WarningAction 'SilentlyContinue' -ErrorAction 'Stop'
        write-host "Unable to connect to the new vCenter Management Server $my_new_vc"

write-host "Connected to $my_new_vc."
write-host "Start applying DRS Information. Please Wait."

foreach ($drshostgrp in $drshostgrps)
        New-DRSVMHostGroup -name:$ -cluster:$drshostgrp.cluster -vmhost:$drshostgrp.vmhost

write-host "DRSRule VMHost Groups Created."

foreach ($drsvmgrp in $drsvmgrps)
        New-DrsVMGroup -name:$ -cluster:$drsvmgrp.cluster -vm:$drsvmgrp.vm

write-host "DRSRule VM Groups created."
foreach ($drs_vm2host in $drs_vm2hosts)
        new-DRSVMToVMHostRule -name:$ -cluster:$drs_vm2host.cluster -enabled:$drs_vm2host.enabled -mandatory:$drs_vm2host.mandatory -VMGroupName:$drs_vm2host.vmgroupname -AffineHostGroupName:$drs_vm2host.AffineHostGroupName -AntiAffineHostGroupName:$drs_vm2host.AntiAffineHostGroupName

write-host "DRSRule VM to VMHost Rules created"

foreach ($drs_vm2vm in $drs_vm2vms)
        New-DrsVMToVMRule -name:$ -cluster:$drs_vm2vm.cluster -enabled:$drs_vm2vm.enabled -KeepTogether:$drs_vm2vm.keeptogether -vm:$drs_vm2vm.vm
        write-host "DRSRule VM to VM Rules created."

write-host "All done. Please validate all rules manually on the clusters and vcenter management servers."

write-host "Disconnecting from the VC's"

Disconnect-VIServer * -confirm:$false

write-host "Disconnected"


Writing a blog post on a Huawei server seems strange but with Huawei growing in popularity, especially in the South African market it’s probably overdue.

In this post I will be covering my experience with my first Huawei server and detailing some issues I experienced during my deployment.

I’ve had the opportunity to work with the FusionServer (RH5885H v3). Now for those who do not know what “FusionServer” is, it’s basically Huawei punting virtualization on this specific hardware, whether it be VMware,Hyper-V or other popular hypervisors. They probably wont admit to it, but I’m sure that they would prefer you to run their virtualization software “Fusion Sphere”, anyway I will leave it at that. If you would like to know more about FusionSphere refer to their site here.

This is the maximum configurable specifications for this server.


As you can see it’s able to take a good amount of resources. The spec that I used was 4 sockets 15 cores and 2TB of memory.

Installing ESXi.

According to Huawei, you will need to build your own ESXi image and inject all their drivers into the image. I found this to be a bit of a pain as most vendors I’ve worked with do this for all supported versions of ESXi. I’m hoping that I can convince Huawei to start doing this.

I used Huawei’s official OS installation guide to build a customized ESXi image. The official OS installation guide can be found here.

You will need four components to build the customized ESXi image.

  • ESXi-Customizer – Available here.
    Andreas Peetz has done some awesome work on this tool. I also recommend you use the new PowerCli version and not the old version referenced on the Huawei documentation.
  • The correct ESXi version you would like to install.
    Available from on VMware’s site. You need to use the offline .zip bundle.
  • The Huawei driver installation package.
    Available from Huawei’s site, follow the Huawei OS Install guide.
  • PowerCli 5.x or later.
    Also available from VMware’s site.

You can follow this video for detailed “how to” guide to create your customized image.

Preparing the Hardware

Before you start installing  ESXi from your newly created media, you need to make some changes to the servers BIOS and more importantly, update the BIOS firmware. I found that the BIOS was missing a crucial feature that would allow it to be added to an “EVC” enabled cluster.

To update the firmware on your BIOS speak to your Huawei representative to assist. You need to ensure that you are not on a firmware version older than  (U117) V322 .


Next, you need to configure the default options to enable virtualization on the BIOS as you would any other server. The one thing on this BIOS you need to make sure is enabled is this :

huawei_bios-300x269-4675035NOTE: Huawei thought it would be a good idea to have a default password on the BIOS which is “Huawei123$”

This is the reason we had to update the BIOS. This option is not available on the older firmware version and as I mentioned before, you wont be able to add this host to an “EVC” enabled cluster without this option “enabled”

Once you have completed all these steps you can start installing your server as normal.


PowerCLI – ESXi CIM Permissions


Using HP’s IRS (Insight Remote Support) to monitor  your ESXi nodes hardware ?

Setting this up was not as straight forward as you think it should be. Turns out that the permissions required as outlined on VMware’s site here is not enough to get HP’s IRS to work.

Use this script to configure the correct accounts and permissions.

Param ([string]$filename)

# Global environment variables #
$esxi_user = "root"
$esxi_pass = "yourpass"
$command = ".ScriptESXicmd.txt"
$putty_exe = ".ScriptESXiplink.exe"
$esxi_hosts = Import-Csv $filename

foreach ($esxi_host in $esxi_hosts){
try {
Connect-VIServer $ -User $esxi_user -Password $esxi_pass -ErrorAction:Stop
catch {
"Unbale to connect to the ESXi node. Please check node connectivity"
# start the SSH Services
Get-VMHost -Name $ | Get-VMHostService | Where-Object {$_.Key -eq "TSM-SSH"} | Start-VMHostService

#Create the hpris user account on the ESxi node.
New-VMHostAccount -Id $irs_user -Password $irs_pass -GrantShellAccess:$false

#Create a new role with CIM privileges
New-VIRole $vi_role -Privilege "CIM","System Management"

#Assign the hpris user account the role we just created.
Get-VMHost | New-VIPermission -Principal $irs_user -Role $vi_role -Propagate:$true

#SSH to the host and apply some security restrcition to the new user.

$ssh_host = $
echo y | .plink.exe -ssh root@$ssh_host -P 22 -pw $esxi_pass -m $command

#Stop the SSH services
Get-VMHost -Name $ | Get-VMHostService | Where-Object {$_.Key -eq "TSM-SSH"} | Stop-VMHostService -Confirm:$false

#Disconnect from the ESXi node.
Disconnect-VIServer -Confirm:$false

Before executing the script you will need these files in the root of the script

  • Source input file containing the host DNS or IP info. You can get a sample here (sample)
  • plink.exe is required to execute some commands on the ESXi node. You can get that here.
  • You need a text file that contains all the commands that are executed remotely on the ESXi node. You can get a sample here (cmd.txt)

vCAC – Invalid Credentials

he other day one of my co-workers had some issues logging on to vCAC. It would accept his logon credentials but would just refresh the page and ask to logon again. Seeing similar issues in the past on vCenter web logon it seemed that it was an SSO issue. I decided to have a look at the identity appliance. Everything looked fine on the identity appliance. I then headed over to the vCAC appliance. I could see that a couple of the web services visible on the vCAC appliance web logon screen were in a failed state.

Going over the log files it was very clear what the problem was… Invalid credentials  

I then used JXplorer to test my SSO credentials and they are all good.
Had a look at “administrator@vsphere.local” user in the LDAP browser to check if the account was locked or expired and they all looked in order, no lockout or expired account.

I know what you thinking…. “Just re-register the vCAC appliance to the SSO server” believe me I tried. I just kept on getting the same error in the log files. Strange thing is that during the process of re-registering the vCAC appliance back to the SSO server it accepts the credentials, with an a error message.  “Unable to establish a connection to “SSO-Appliance.domain.local:7444” (that’s not my actual servers DNS details).

A week ago my colleague replaced all the certificates with signed certificates using our internal CA. What a mission that was. There doesn’t seem to be any good documentation or good guides on how to do this. The couple we were able to find were helpful but needed you to at least have a good understanding about certificates and how they worked. I think I should do a step-by-step guide for this,  anyway getting back to the problem. After he replaced them everything worked as expected, for about a week. I created blueprints, deployed some new virtual machines without any issues.

So I don’t think it could be the certificates. I have gone over all them a couple of times and they look fine.

Decided to do a full restore of both appliances. If this doesn’t work I guess I will get VMware support to have a look at it.

Will update this post when I eventually fix this.

UPDATE – Problem solved.

I decided to open a case with VMware tech support. The engineer tried to assist but turns out our version of vCAC 6.0.1 has reach EOL support. I actually didn’t even realize that this version of vCAC only had support for 13 months.  Needles to say VMware tried to assist but eventually told me to update before they would continue with the support request.

I decided to go over everything again before I updated. I noticed that when listing the internal solution users within the keystore on the vCAC appliance, there was something wrong with internal solution users certificates. Some seemed to be expired. “BTW when asked to enter a password just hit enter.”

vcachost01:/etc/vcac # keytool -list -keystore vcac.keystore

cafe, Mar 28, 2016, PrivateKeyEntry,
Certificate fingerprint (SHA1): CA:16:72:1F:BD:E8:2B:82:FE:56:85:8D:92:F5:DE:A7:BE:6D:B5:5D
csp-admin, Mar 28, 2016, PrivateKeyEntry,
Certificate fingerprint (SHA1): C7:C1:17:E9:15:22:06:C0:3B:12:DB:15:43:0E:B0:7D:EA:01:A5:44
websso, Apr 09, 2016, trustedCertEntry,
Certificate fingerprint (SHA1): B4:78:36:CE:E5:C9:74:8F:B8:3B:24:9B:BB:C0:FB:32:8C:62:F2:F9
apache, Mar 29, 2016, PrivateKeyEntry,
Certificate fingerprint (SHA1): 7B:C0:DC:80:78:09:EB:49:90:72:60:AD:61:5D:32:C3:B0:B7:60:3E

This led me to believe that this could be the issue.  So this is what i did to resolve it. Not sure if this is supported by VMware but hey, product was EOL and I really did not feel like rebuilding the environment on a Friday.

Perform these steps on the vCAC appliance.

vcachost01:cat /etc/vcac/

You should see the internal solution users that have expired certificates. In my case these were the culprits.


I needed these solution users to recreated with new certificates. Fortunately when you register vCAC with the SSO identity appliance, these solution users are created if they are missing.

Before editing any files make a copy of the file.

vcachost01:cp /etc/vcac/ /etc/vcac/

Now edit the file :

vcachost01:vi /etc/vcac/

Remove the solution users with expired certificates and save your file.

Now re-register the vCAC appliance back to the identity appliance using the command:

vcac-config -e register-with-sso –tenant vsphere.local –user administrator@vsphere.local –password ‘yourpass’

Once this command has executed without any errors your vCAC appliance should be up and running again.

I have decided to plan an update to vRA 7.x in the next couple of weeks and will be doing a detailed upgrade guide.

Please feel free to leave a comment or any suggestions.