The Issue
Following a networking change there was a warm start on our IBM V7000 storage nodes\cannisters that caused an outage to the VMware environment in the sense that locks on certain LUNs caused a mini-APD (all Paths Down) This issue occurs if the ESXi/ESX host cannot reserve the LUN. The LUN may be locked by another host (an ESXi/ESX host or any other server that has access to the LUN). Typically, there is nothing queued for the LUN. The reservation is done at the SCSI level.
Caution: The reserve, release, and reset commands can interrupt the operations of other servers on a storage area network (SAN). Use these commands with caution.
Note: LUN resets are used to remove all SCSI-2 reservations on a specific device. A LUN reset does not affect any virtual machines that are running on the LUN.
Instructions
- SSH into the host and type esxcfg-scsidevs -c to verify that the LUN is detected by the ESX host at boot time. If the LUN is not listed then rescan the storage
- Next type cat /var/log/vmkernel.log
- press Shift+G to reach the end of the file
- You will see messages in the log such as below
- x0b1800, oxid xffff SCSI Reservation Conflict –
2015-01-23T18:59:57.061Z cpu63:32832)lpfc: lpfc_scsi_cmd_iocb_cmpl:2057: 3:(0):3271: FCP cmd x16 failed <0/4> sid x0b2700, did - You will need to find the naa ID or the vml ID of the LUNs you need to reset.
- You can do this by running the command esxcfg-info | egrep -B5 “s Reserved|Pending”
- The host that has Pending Reserves with a value that is larger than 0 is holding the lock.
- We then had to run the below command to reset the LUNs
- vmkfstools -L lunreset /vmfs/devices/disks/naa.60050768028080befc00000000000116
- Then run vmkfstools -V to rescan
- Occasionally you may need to restart the management services on particular hosts by running /sbin/services.sh restart in a putty session then restart the vCenter service but it depends on your individual situation