I recently stumbled upon this issue when adding some new vNICs for use with a new iSCSI storage array. I followed the usual order of operations for our UCS B-series blades: create new vNIC templates, add the vNICs to the LAN Connectivity Policy in UCS Manager, put first vSphere host in the cluster into maintenance mode, shut down and then hit blinky “pending actions” light in UCS to reboot the corresponding service profile. Host comes back up, take it out of maintenance mode and let DRS migrate some VMs back over. Only this time, the vMotion process kept failing VMs trying to go back over to this updated host.
Luckily, I was already working on another issue at the same time with Cisco TAC when this reared its ugly head. We took a look and after a while saw that after adding the two vNICs and rebooting, the vmnics came up a little bit out of order in vSphere:
In UCS, I had created the MAC pools to reference vmnics by either Fabric A or B and corresponding vmnic number. Highlighted above, we saw that vmnic8 was showing a MAC address with aa:a4, which meant it “should” actually be vmnic4 rather than vmnic8. It just so happens that vmnic was one of my vMotion uplinks. TAC mentioned that this was a known issue that was still yet to be resolved between Cisco and VMware. They pointed me to a Cisco bug article that referenced a VMware KB for the fix:
The KB shows to SSH into the vSphere host with the ordering issue and run “localcli –plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias list” to output the vmnic aliases and their corresponding bus address:
Per the article, you change the alias assigned to a particular bus address by running another local-cli command. In this instance, I ran the command:
localcli –plugin-dir /usr/lib/vmware/esxcli/int/ deviceInternal alias store –bus-type pci –alias vmnic4 –bus-address s00000000:07:00
Which is basically saying change the bus address that is currently assigned to vmnic8 to use the alias vmnic4. In the first screenshot, it shows that what is vmnic8 really should be vmnic4 based on the MAC address.
The command had to be run again four more times (of course using the cli syntax):
change alias vmnic4 to vmnic5
change alias vmnic5 to vmnic6
change alias vmnic6 to vmnic7
change alias vmnic7 to vmnic8
vmnic9 was correct, so after making all the alias changes, the host was rebooted again and came back up showing the proper order:
The host came out of maintenance mode, vMotions worked fine and all was right with the world.
I haven’t done much digging to see if this indeed is a known issue based on certain versions of vSphere/UCS, so if anyone else has run into this before or has any info other than what I heard from TAC in that it is has yet to be worked out between VMware and Cisco, definitely post a comment with any updates.