Archive for December 2012

ESXi / ESX 4/5hosts with visibility to RDM LUNs being used by MSCS nodes with RDMs may take a long time to boot or during LUN rescan

The Problem

We were finding some of our IBM x3850 VMware ESXi 4.X Servers were taking a long time to boot up, somewhere in the region of 30 minutes which was unacceptable during upgrades and general maintenance. We are running vSphere 4.1 U3.

The Explanation

During a boot of an ESXi host, the storage mid-layer attempts to discover all devices presented to an ESXi host during the device claiming phase. However, MSCS LUNs that have a permanent SCSI reservation cause the boot process to elongate as the ESXi host cannot interrogate the LUN due to the persistent SCSI reservation placed on a device by an active MSCS Node hosted on another ESXi host.

Configuring the device to be perennially reserved is local to each ESXi host, and must be performed on every ESXi host that has visibility to each device participating in an MSCS cluster

Solution for VMware vSphere 4.X

Modify this advanced configuration option below on the affected ESXi/ESX hosts to speed up the boot process:

  • ESXi/ESX 4.1: Change the advanced option scsi.CRTimeoutDuringBoot TO 1
  • ESXi/ESX 4.0: Change the advanced option scsi.UWConflictRetries to 80

We also adjusted a setting in the BIOS

  • Log onto IMM of the server (see Server list for IMM IP address), and remote control to server. Reboot
  • Enter BIOS when prompted by pressing F1.
  • Go to System settings>Devices and I/O ports>Enable/disable Adaptor Option ROM Support
  • Disable any empty slots in UEFI option ROM

Solution for VMware vSphere 5.X

  1. Determine which RDM LUNs are part of an MSCS cluster.
  2. From the vSphere Client, select a virtual machine that has a mapping to the MSCS cluster RDM devices.
  3. Edit your virtual machine settings and navigate to your Mapped RAW LUNs.
  4. Select Manage Paths to display the device properties of the Mapped RAW LUN and the device identifier (that is, the naa ID)
  5. Take note of the naa ID, which is a globally unique identifier for your shared device.
  6. Log into Putty and type the following commands. One per line for each RDM Disk

Server 1 Database Server example with 4 X RDM LUNs example

  • esxcli storage core device setconfig -d naa.60050768028080befc000000000000z1 –perennially-reserved=true
  • esxcli storage core device setconfig -d naa.60050768028080befc000000000000z2 –perennially-reserved=true
  • esxcli storage core device setconfig -d naa.60050768028080befc000000000000z3 –perennially-reserved=true
  • esxcli storage core device setconfig -d naa.60050768028080befc000000000000z4 –perennially-reserved=true

Confirm that the correct devices are marked as perennially reserved by running the command:

  • esxcli storage core device list | less

More Information

http://kb.vmware.com/externalId=1016106

http://www-947.ibm.com/support