Archive for Objective 1 Storage

Apply space utilization data to manage storage resources

The df Command

Used for checking disk space usage on the ESX/ESXi service console partitions

This objective on the VCAP-DCA exam is slightly ambiguous but I have decided to focus on the command “df” which stands for disk filesystemor disk free. This command is used to display the filesystems that are mounted to that particular host and the usage associated with these filesystems

The “-h” switch makes the output human readable

Instructions

  • Log into an SSH session or vMA or locally whichever you prefer
  • Type df-h to view an easily readable format

df

  • Review the Use% for each of the listed items. If any of the volumes listed are 100% full, they must be investigated to determine if space can be freed. The most important mount points to investigate on a default installation of ESX are the / and /var/log mounts because if they are full they can prevent proper operation of the ESX host
  • VisorFS is a Special-purpose File System for Efficient Handling of System Images
  • The vfat partition is created for new installations of ESXi, during the autoconfiguration phase. In vSphere 5 it is now a 4GB vfat scratch partition

The vdf Command

Checking disk space usage on a VMFS volume of an ESX/ESXi host

  • Log into an SSH session or vMA or locally whichever you prefer
  • Type vdf-h to view an easily readable format

Capture4

  •  Review the Use% for each of the listed items. If any of the volumes listed are 100% full, they must be investigated to determine if space can be freed. If a VMFS volume is full you cannot create any new virtual machines and any virtual machines that are using snapshots may fail

Useful VMware KB Article

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1003564

Identify available Storage Monitoring Tools, metrics and alarms

Tools within vCenter

  • Storage Views > Reports

test

These are the settings you can select for the fields

perf4

  • Storage Views > Maps

maps2

  • Datastores View > Performance

perf

perf2

  • Datastores Views > Alarms

alarm1

alarm2

  • vscsiStats

Since ESX 3.5, VMware has provided a tool specifically for profiling  storage: vscsiStats.  vscsiStats collects and reports counters on  storage activity.  Its data is collected at the virtual SCSI device  level in the kernel.  This means that results are reported per VMDK (or  RDM) irrespective of the underlying storage protocol.  The following  data are reported in histogram form:

  • IO size
  • Seek distance
  • Outstanding IOs
  • Latency (in microseconds)
  • ESXTOP/RESXTOP

Type V to view Virtual Machines then type d to view disk statistics per hba. Type f to add extra fields to view other statistics

hba

hba2

Type V to view Virtual Machines then type u to view disk statistics per device. Type f to add extra fields to view other statistics

device1

device2

General Disk Statistics

  • CMDS/s – Number of commands issued per second.
  • READS/s – Number of read commands issued per second.
  • WRITES/s – Number of write commands issued per second.
  • MBREAD/s – Megabytes read per second.
  • MBWRTN/s – Megabytes written per second.
  • KAVG – These counters track the latencies due to the ESX Kernel’s command.The KAVG value should be very small in comparison to the DAVG value and should be close to zero. When there is a lot of queuing in ESX, KAVG can be as high, or even higher than DAVG. If this happens, please check the queue statistics, which will be discussed next
  • DAVG – This is the latency seen at the device driver level. It includes the roundtrip time between the HBA and the storage. DAVG is a good indicator of performance of the backend storage. If IO latencies are suspected to be causing performance problems, DAVG should be examined. Compare IO latencies with corresponding data from the storage array. If they are close, check the array for misconfiguration or faults. If not, compare DAVG with corresponding data from points in between the array and the ESX Server, e.g., FC switches. If this intermediate data also matches DAVG values, it is likely that the storage is under-configured for the application. Adding disk spindles or changing the RAID level may help in such cases.
  • GAVG – This is the round-trip latency that the guest sees for all IO requests sent to the virtual storage device -GAVG = KAVG + DAVG. The average queue latency. QAVG is part of KAVG. Response time is the sum of the time spent in queues in the storage stack and the service time spent by each resource in servicing the request. The largest component of the service time is the time spent in retrieving data from physical storage. If QAVG is high, another line of investigation is to examine the queue depths at each level in the storage stack
  • AQLEN – The storage adapter queue depth. This is the maximum number of ESX Server VMKernel active commands that the adapter driver is configured to support.
  • LQLEN – The LUN queue depth. This is the maximum number of ESX Server VMKernel active commands that the LUN is allowed to have. (Note that, in this document, the terminologies of LUN and Storage device can be used interchangeably.)
  • WQLEN – The World queue depth. This is the maximum number of ESX Server VMKernel active commands that the World is allowed to have. Note that this is a per LUN maximum for the World
  • ACTV – The number of commands in the ESX Server VMKernel that are currently active. This statistic is only applicable to worlds and LUNs.
  • QUED – The number of commands in the VMKernel that are currently queued. This statistic is only applicable to worlds and LUNs. Queued commands are commands waiting for an open slot in the queue. A large number of queued commands may be an indication that the storage system is overloaded. A sustained high value for the QUED counter signals a storage bottleneck which may be alleviated by increasing the queue depth. Check that LOAD < 1 after increasing the queue depth. This should also be accompanied by improved performance in terms of increased cmd/s. Note that there are queues in different storage layers. You might want to check the QUED stats for devices, and worlds
  • %USD – The percentage of queue depth used by ESX Server VMKernel active commands. This statistic is only applicable to worlds and LUNs. %USD = ACTV / QLEN * 100% For world stats, WQLEN is used as the denominator. For LUN (aka device) stats, LQLEN is used as the denominator. %USD is a measure of how many of the available command queue “slots” are in use. Sustained high values indicate the potential for queueing; you may need to adjust the queue depths for system’s HBAs if QUED is also found to be consistently > 1 at the same time. Queue sizes can be adjusted in a few places in the IO path and can be used to alleviate performance problems related to latency. For detailed information on this topic please refer to the VMware whitepaper entitled “Scalable Storage Performance”
  • Load – The ratio of the sum of VMKernel active commands and VMKernel queued commands to the queue depth. This statistic is only applicable to worlds and LUNs. The sum of the active and queued commands gives the total number of outstanding commands issued by that virtual machine. The LOAD counter values is the ratio of this value with respect to the queue depth. If LOAD > 1, check the value of the QUED counter.
  • ABRTS/s – The number of commands aborted per second. It can indicate that the storage system is unable to meet the demands of the guest operating system. Abort commands are issued by the guest when the storage system has not responded within an acceptable amount of time, e.g. 60 seconds on some windows OS’s. Also, resets issued by a guest OS on its virtual SCSI adapter will be translated to aborts of all the commands outstanding on that virtual SCSI adaptor
  • RESET(s) – The number of commands reset per second.

Identify storage provisioning methods

Overview of Storage Provisioning methods

Storage

Types of Storage

Local (Block Storage)

Local storage can be internal hard disks located inside your ESXi host, or it can be external storage systems located outside and connected to the host directly through protocols such as SAS or SATA. The host uses a single connection to a storage disk. On that disk,
you can create a VMFS Datastore, which you use to store virtual machine disk files.Although this storage configuration is possible, it is not a recommended topology. Using single connections between storage arrays and hosts creates single points of failure (SPOF) that can cause interruptions when a connection becomes unreliable or fails.
ESXi supports a variety of internal or external local storage devices, including SCSI, IDE, SATA, USB, and SAS storage systems. Regardless of the type of storage you use, your host hides a physical storage layer from virtual machines

Local

Networked Storage

Networked storage consists of external storage systems that your ESXi host uses to store virtual machine files remotely. Typically, the host accesses these systems over a high-speed storage network.
Networked storage devices are shared. Datastores on networked storage devices can be accessed by multiple hosts concurrently. ESXi supports the following networked storage technologies.

FC (Block Storage)

Stores virtual machine files remotely on an FC storage area network (SAN). FC SAN is a specialized high-speed network that connects your hosts to high-performance storage devices. The network uses Fibre Channel protocol to transport SCSI traffic from virtual machines to the FC SAN devices.
To connect to the FC SAN, your host should be equipped with Fibre Channel host bus adapters (HBAs). Unless you use Fibre Channel direct connect storage, you need Fibre Channel switches to route storage traffic.

FCOE (Block Storage)

If your host contains FCoE (Fibre Channel over Ethernet) adapters, you can connect to your shared Fibre Channel devices by using an Ethernet network.

FC

Internet SCSI (iSCSI) (Block Storage)

Stores virtual machine files on remote iSCSI storage devices. iSCSI packages SCSI storage traffic into the TCP/IP protocol so that it can travel through standard TCP/IP networks instead of the specialized FC network. With an iSCSI connection, your host serves as the initiator that communicates with a target, located in remote iSCSI storage systems. ESXi offers the following types of iSCSI connections:

  • Hardware iSCSI Your host connects to storage through a third-party adapter capable of offloading the iSCSI and network processing. Hardware adapters can be dependent and independent. This is shown on the left adapter on the picture below
  • Software iSCSI Your host uses a software-based iSCSI initiator in the VMkernel to connect to storage. With this type of iSCSI connection, your host needs only a standard network adapter for network connectivity. This is shown on the right adapter on the picture below

iSCSI

Network-attached Storage (NAS) (File Level Storage)

Stores virtual machine files on remote file servers accessed over a standard TCP/IP network. The NFS client built into ESXi uses Network File System (NFS) protocol version 3 to communicate with the NAS/NFS servers. For network connectivity, the host requires a standard network adapter.

NFS

Comparison of Storage Features

storage2

Predictive and Adaptive Schemes for Datastores

When setting up storage for ESXi systems, before creating VMFS datastores, you must decide on the size and number of LUNs to provision. You can experiment using the predictive scheme and the Adaptive Scheme

Predictive

  • Provision several LUNs with different storage characteristics.
  • Create a VMFS datastore on each LUN, labeling each datastore according to its characteristics.
  • Create virtual disks to contain the data for virtual machine applications in the VMFS datastores created on LUNs with the appropriate RAID level for the applications’ requirements.
  • Use disk shares to distinguish high-priority from low-priority virtual machines.

NOTE: Disk shares are relevant only within a given host. The shares assigned to virtual machines on one host have no effect on virtual machines on other hosts.

  • Run the applications to determine whether virtual machine performance is acceptable.

Adaptive

When setting up storage for ESXi hosts, before creating VMFS datastores, you must decide on the number and size of LUNS to provision. You can experiment using the adaptive scheme.

  • Provision a large LUN (RAID 1+0 or RAID 5), with write caching enabled.
  • Create a VMFS on that LUN.
  • Create four or five virtual disks on the VMFS.
  • Run the applications to determine whether disk performance is acceptable
  • If performance is acceptable, you can place additional virtual disks on the VMFS. If performance is not acceptable, create a new, large LUN, possibly with a different RAID level, and repeat the process. Use migration so that you do not lose virtual machines data when you recreate the LUN.

Tools for provisioning storage

  • vClient
  • Web Client
  • vmkfstools
  • SAN Vendor Tools

VMware Link

http://www.vmware.com/files/pdf/techpaper/Storage_Protocol_Comparison.pdf

 

vmkfstools

Monitoring

What can use vmkfstools for?

You use vmkfstools to

  • Create and manipulate virtual disks
  • Create and manipulate file system
  • Create and manipulate logical volumes
  • Create and manipulate physical storage devices on an ESX/ESXi host.
  • Create and manage a virtual machine file system (VMFS) on a physical partition of a disk and to manipulate files, such as virtual disks, stored on VMFS-3 and NFS.
  • You can also use vmkfstools to set up and manage raw device mappings (RDMs)

The long and single-letter forms of the options are equivalent. For example, the following commands are identical.

example 1

example2

Options

  • Type vmkfstools –help

vmkfs8

Great vmkfstools Link

http://vmetc.com/wp-content/uploads/2007/11/man-vmkfstools.txt

 

Upgrade VMware Storage Infrastructure

VMFS1

When upgrading from vSphere 4 to vSphere 5, it is not required to upgrade datastores from VMFS-3 to VMFS-5. This might be relevant if a subset of ESX/ESXi 4 hosts will remain in your environment. When the decision is made to upgrade datastores from version 3 to version 5 note that the upgrade process can be performed on active datastores, with no disruption to running VMs

Benefits

  • Unified 1MB File Block Size

Previous versions of VMFS used 1,2,4 or 8MB file blocks. These larger blocks were needed to create large files (>256GB). These large blocks are no longer needed for large files on VMFS-5. Very large files can now be created on VMFS-5 using 1MB file blocks.

  • Large Single Extent Volumes

In previous versions of VMFS, the largest single extent was 2TB. With VMFS-5, this limit is now 64TB.

  • Smaller Sub-Block

VMFS-5 introduces a smaller sub-block. This is now 8KB rather than the 64KB we had in previous versions. Now small files < 8KB (but > 1KB) in size will only consume 8KB rather than 64KB. This will reduce the amount of disk space being stranded by small files.

  • Small File Support

VMFS-5 introduces support for very small files. For files less than or equal to 1KB, VMFS-5 uses the file descriptor location in the metadata for storage rather than file blocks. When they grow above 1KB, these files will then start to use the new 8KB sub blocks. This will again reduce the amount of disk space being stranded by very small files.

  • Increased File Count

VMFS-5 introduces support for greater than 100,000 files, a three-fold increase on the number of files supported on VMFS-3, which was 30,000.

  • ATS Enhancement

This Hardware Acceleration primitive, Atomic Test & Set (ATS), is now used throughout VMFS-5 for file locking. ATS is part of the VAAI (vSphere Storage APIs for Array Integration) This enhancement improves the file locking performance over previous versions of VMFS.

Considerations for Upgrade

  • If your datastores were formatted with VMFS2 or VMFS3, you can upgrade the datastores to VMFS5.
  • To upgrade a VMFS2 datastore, you use a two-step process that involves upgrading VMFS2 to VMFS3 first. Because ESXi 5.0 hosts cannot access VMFS2 datastores, use a legacy host, ESX/ESXi 4.x or earlier, to access the VMFS2 datastore and perform the VMFS2 to VMFS3 upgrade.
  • After you upgrade your VMFS2 datastore to VMFS3, the datastore becomes available on the ESXi 5.0 host, where you complete the process of upgrading to VMFS5.
  • When you upgrade your datastore, the ESXi file-locking mechanism ensures that no remote host or local process is accessing the VMFS datastore being upgraded. Your host preserves all files on the datastore
  • The datastore upgrade is a one-way process. After upgrading your datastore, you cannot revert it back to its previous VMFS format.
  • Verify that the volume to be upgraded has at least 2MB of free blocks available and 1 free file descriptor.
  • All hosts accessing the datastore must support VMFS 5
  • You cannot upgrade VMFS3 volumes to VMFS5 remotely with the vmkfstools command included in vSphere CLI.

Comparing VMFS3 and VMFS5

VMFS5

Instructions for upgrading

  • Log in to the vSphere Client and select a host from the Inventory panel.
  • Click the Configuration tab and click Storage.
  • Select the VMFS3 datastore.
  • Click Upgrade to VMFS5.

vmfs4

  • A warning message about host version support appears.
  • Click OK to start the upgrade.

vmfs6

  • The task Upgrade VMFS appears in the Recent Tasks list.
  • Perform a rescan on all hosts that are associated with the datastore.

Upgrading via ESXCLI

  • esxcli storage vmfs upgrade -l volume_name

esxcli1

Other considerations

  • The maximum size of a VMDK on VMFS-5 is still 2TB -512 bytes.
  • The maximum size of a non-passthru (virtual) RDM on VMFS-5 is still 2TB -512 bytes.
  • The maximum number of LUNs that are supported on an ESXi 5.0 host is still 256
  • There is now support for passthru RDMs to be ~ 60TB in size.
  • Non-passthru RDMs are still limited to 2TB – 512 bytes.
  • Both upgraded VMFS-5 & newly created VMFS-5 support the larger passthru RDM.

Prepare storage for maintenance

under_maintenance

Sometimes you will need to perform maintenance on a Datastore which will require placing it in Maintenance Mode and unmounting/remounting it

When you unmount a datastore, it remains intact, but can no longer be seen from the hosts that you specify. The datastore continues to appear on other hosts, where it remains mounted

Instructions

  1. Click Hosts and Clusters View from Home
  2. Select the Host with the attached datastore
  3. Click the Configuration tab
  4. Click on Storage within the Hardware frame
  5. Locate the Datastore to unmount
  6. Right click the datastore and select Properties
  7. Uncheck Enabled under Storage I/O Control and then click Close
  8. Right click the datastore and select Enter SDRS Maintenance Mode
  9. Right Click the Datastore and select Unmount. You should be greeted by this screen warning

unmount

  • Note: The Detach function must be performed on a per-host basis and does not propagate to other hosts in vCenter Server. If a LUN is presented to an initiator group or storage group on the SAN, the Detach function must be performed on every host in that initiator group before unmapping the LUN from the group on the SAN. Failing to follow this step results in an all-paths-down (APD) state for those hosts in the storage group on which Detach was not performed for the LUN being unmapped

Unmounting a LUN from the command line

  • Type esxcli storage filesystem list
  • The output will look like the below

unmount2

  • Unmount the datastore by running the command:
  • esxcli storage filesystem unmount [-u UUID | -l label | -p path ]
  • For example, use one of these commands to unmount the LUN01 datastore:

esxcli storage filesystem unmount -l LUN01

esxcli storage filesystem unmount -u 4e414917-a8d75514-6bae-0019b9f1ecf4

esxcli storage filesystem unmount -p /vmfs/volumes/4e414917-a8d75514-6bae-0019b9f1ecf4

  • To verify that the datastore has been unmounted, run the command:
  • esxcli storage filesystem list
  • The output is similar to:

unmount4

  • Note that the Mounted field is set to false, the Type field is set to VMFS-unknown version, and that no Mount Point exists.
  • Note: The unmounted state of the VMFS datastore persists across reboots. This is the default behavior. However, it can be changed by appending the –no-persist flag.

VMware Link

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2004605

 

Configure and Administer Profile Driven Storage

What is Profile Driven Storage?

Profile Driven Storage enables the creation of Datastores that provide different levels of service. You can use Virtual Machine storage profiles and storage capabilities to ensure storage provides different levels of

  • Capacity
  • Performance
  • Availability
  • Redundancy

By doing this we create levels of compliance the Virtual Machines are linked to in order to maintain ongoing management and placed on storage that is suitable for its use.

storageprofile0

Profile driven storage is composed of 2 components where a user defined capability can be used alongside a Storage capability.

  • Storage capabilities which details the features that a storage system offers provided by a VASA Vendor provider
  • User defined capabilities which can be associated with multiple datastores

storageprofile

Instructions for creating Profile Driven Storage

A VM storage profile is attached to a Storage capability. In turn a Storage Capability profile is attached to a datastore.

  • View the System defined storage capabilities that your storage system defines

vasa

  • Create a user-defined storage capability for your Virtual Machines
  • Go to VM Storage Profiles in vCenter

Capture

  • Click Enable VM Storage Profiles

Capture2

  • View the box which appears and enable these for a host or a cluster and click Close

Capture3

  • Click Manage Storage Capabilities

Capture4

  • Click Add
  • Type a name for your storage capability. E.g Gold Storage, Silver Storage, Replicated Storage
  • Add a description if you want and click OK

Capture5

  • Next click Create VM Storage Profile
  • Type a name and a description

Capture6

  • Select the Storage capability you require from what we created at the start of these instructions. E.g Gold Storage, Silver Storage, Replicated Storage

Capture7

  • Click Next and Finish
  • Go to Datastores and Datastore Clusters
  • Right click a Datastore and select Assign User Defined Storage Capability

Capture8

  • Select the capability you created.
  • Now you can create a VM and within the setup wizard on the storage tab, you can select a storage profile to use which will immediately show you which Datastores are compatible and which ones are not
  • On a VM you can also see from the summary tab whether the profile is compliant or not as per below screenprint

compliant

  • And you can also right click on a VM and manage a profile or check profile compliance

Capture10

Resolving Non Compliant VMs

A non-compliant machine must storage migrate the virtual disks it owns:

  • Enter the Host and Clusters view
  • Select a non-compliant virtual machine
  • Right-Click the Virtual Machine and click Migrate
  • On the migration type screen, click Change Datastore, click Next
  • On the storage screen, optionally select the new disk format for post-migration
  • Select the VM Storage Profile to bring into compliance for the non-compliant VM.
  • If you are migrating an individual virtual disk within a VM, Click Advanced
  • Select the virtual disk you want to move to the new storage profile and then click the Browse under the Datastore column
  • Verify that the VM Storage Profile is correct, if not select the appropriate VM Storage Profile
  • Select a Compatible Datastore Cluster to place your non-compliant virtual disk
  • Optionally, you may disable SDRS for this virtual machine
  • Click OK
  • Click Next
  • Verify your settings at the completion screen and select show all storage recommendations
  • Verify that you agree with the migration recommendations and then click Apply Recommendations
  • Repeat the section above, Check Storage Profile Compliance

Identify and tag SSD Devices

SSD

You can use PSA SATP claim rules to tag SSD devices that are not detected automatically.

Only devices that are consumed by the PSA Native Multipathing (NMP) plugin can be tagged.

Procedure

First find all your relevant information

  • Identify the drive to be tagged and its SATP.
  • For example our drive is called (naa.600605b008f362e01c91d3154a908da1):1
  • Type esxcli storage nmp device list -d naa.600605b008f362e01c91d3154a908da1
  • The command results in the following information.

SSDTagging

  • Note down the SATP associated with the device.
  • You can also run the following to get extra information and you can see that Is SSD is marked false when it should be true
  • esxcli storage core device list -d naa.600605b008f362e01c91d3154a908da1

SSDTagging2

Create a new SATP Rule

  • Add a PSA claim rule to mark the device as SSD.
  • There are several ways to do this
  • You can add a claim rule by specifying the device name.
  • esxcli storage nmp satp rule add -s VMW_SATP_CX -d naa.600605b008f362e01c91d3154a908da1 -o enable_ssd

SSDTagging4

  • You can add a claim rule by specifying the vendor name and the model name.
    esxcli storage nmp satp rule add -s VMW_SATP_CX -V vendor_name -M model_name –option=enable_ssd
  • You can add a claim rule based on the transport protocol.
  • esxcli storage nmp satp rule add -s VMW_SATP_CX  –transport transport_protocol –option=enable_ssd
  • You can add a claim rule based on the driver name.
  • esxcli storage nmp satp rule add -s VMW_SATP_CX  –driver driver_name –option=enable_ssd

Restart the host

  • You now need to restart the host

Unclaiming the device

  • You can now unclaim the device by specifying the device name.
  • esxcli storage core claiming unclaim –type device –device naa.600605b008f362e01c91d3154a908da1

SSDTagging5

  • You can unclaim the device by specifying the vendor name and the model name.
  • esxcli storage core claiming unclaim –type device -V vendor_name -M model_name
  • You can unclaim the device based on the transport protocol.
  • esxcli storage core claiming unclaim –type device –transport transport_protocol
  • You can unclaim the device based on the driver name.
  • esxcli storage core claiming unclaim –type device –driver driver_name

Reclaim the device by running the following commands.

  • esxcli storage core claimrule load
  • esxcli storage core claimrule run
  • esxcli storage core claiming reclaim -d naa.600605b008f362e01c91d3154a908da1

Verify if devices are tagged as SSD.

  • esxcli storage core device list -d device_name
  • or
  • esxcli storage core device list -d naa.600605b008f362e01c91d3154a908da1 |grep SSD

SSDTagging6

The command output indicates if a listed device is tagged as SSD.

  • Is SSD: true

What to do next

If the SSD device that you want to tag is shared among multiple hosts, make sure that you tag the device from all the hosts that share the device.

Useful Link

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2013188

Analyse I/O Workloads to determine storage Performance Requirements

What causes Storage Performance issues?

Poor storage performance is generally the result of high I/O latency, but what can cause high storage performance and how to address it? Below are a list of things that can cause poor storage performance

Analysis of storage system workloads is important for a number of reasons. The analysis might be performed to understand the usage patterns of existing storage systems. It is very important for the architects to understand the usage patterns when designing and developing a new, or improving upon the existing design of a storage system. It is also important for a system administrator to understand the usage patterns when configuring and tuning a storage system

  • Under sized storage arrays/devices unable to provide the needed performance
  • I/O Stack Queue congestion
  • I/O Bandwidth saturation, Link/Pipe Saturation
  • Host CPU Saturation
  • Guest Level Driver and Queuing Interactions
  • Incorrectly Tuned Applications

Methods of determining Performance Requirements

There are various tools which can give us insight into how our applications are performing on a virtual infrastructure as listed below

  • vSphere Client Counters
  • esxstop/resxtop
  • vscsistats
  • Iometer
  • I/O Analyzer (VMware Fling)

vSphere Client Counters

The most significant counters to monitor for disk performance are

  • Disk Throughput (Disk Read Rate/Disk Write rate/Disk Usage) Monitored per LUN or per Host
  • Disk Latency (Physical Device Write Latency/Physical Device Write Latency no greater than 15ms and Kernel disk Read Latency/Kernel Disk Write Latency no greater than 4ms
  • Number of commands queued
  • Number of active disk commands
  • Number of aborted disk commands (Disk Command Aborts)

ESXTOP/RESXTOP

The most significant counters to monitor for disk performance are below and can be monitored per HBA

  • READs/s – Number of Disk Reads/s
  • WRITEs/s – Number of Disk Writes/s
  • MBREAD/s – MB read per second
  • MBWRN/s – MB written per second
  • GAVG (Guest Average Latency) total latency as seen from vSphere. GAVG is made up of KAVG and DAVG
  • KAVG (Kernel Average Latency) time an I/O request spent waiting inside the vSphere storage stack. Should be close to 0 but anything greater than 2 ms may be a performance problem
  • QAVG (Queue Average latency) time spent waiting in a queue inside the vSphere Storage Stack.
  • DAVG (Device Average Latency) latency coming from the physical hardware, HBA and Storage device. Should be less than 10
  • ACTV – Number of active I/O Operations
  • QUED – I/O operations waiting to be processed. If this is getting into constant double digits then look carefully as the storage hardware cannot keep up with the host
  • ABRTS – A sign of an overloaded system

stroage2

vscsiStats

Since ESX 3.5, VMware has provided a tool specifically for profiling  storage: vscsiStats.  vscsiStats collects and reports counters on  storage activity.  Its data is collected at the virtual SCSI device  level in the kernel.  This means that results are reported per VMDK (or  RDM) irrespective of the underlying storage protocol.  The following  data are reported in histogram form:

  • IO size
  • Seek distance
  • Outstanding IOs
  • Latency (in microseconds)

vscsiStats Command Options

  • -l – Lists running virtual machines and their world (worldGroupID)
  • -s – Starts vscsiStats data collection
  • -x Stops vscsiStats data collection
  • -p – Prints histogram information ( all, ioLength, seekDistance, outstandingIOs, latency, interarrival)
  • -c – Produces results in a comma-delimted list
  • -h – Displays the hep menu for more info
  • seekDistance is the distance in logical block numbers (LBN) that the disk head must travel to read or write a block. If a concentration of your seek distance is very small (less than 1), then the data is sequential in nature. If the seek distance is varied, your level of randomization may be proportional to this distance traveled
  • interarrival is the amount of time in microseconds between virtual machine disk commands.
  • latency is the time of the I/O trip.
  • ioLength is the size of the I/O. This is useful when you are trying to determine how to layout your disks or how to optimize the performance of the guest O/S and applications running on the virtual machines.
  • outstandingIOs will give you an idea of any queuing that is occurring.

Instructions

I found vscsiStats in the following locations

/usr/sbin

/usr/lib/vmware/bin

  • Determine the world number for your virtual machine
  • Log into an SSH session and type
  • cd /usr
  • cd /sbin
  • vscsiStats -l
  • Record the world ID for the virtual machine you would like to monitor
  • As per example below – 62615

Capture

  • Next capture data for your virtual machine
  • vscsiStats -s -w (worldgroup ID)
  • vscsiStats -s – w 62615
  • Although vscsiStats exits, it is still gathering data

putty

  • Once it has started, it will automatically stop after 30 minutes
  • Type the below command to display histograms for all in a comma-delimited list
  • vscsiStats -p all -c
  • You will see many of these histograms listed

putty3

  • Type the following to show the latency histogram
  • vscsiStats -p latency

putty2

  • You can also run vscsiStats and output to a file
  • vscsiStats -p latency > /tmp/vscsioutputfile.txt
  • To manually stop the data collection and reset the counters, type the following command
  • vscsStats -x -w 62615
  • To reset all counters  to zero, run
  • vscsiStats -r

Iometer

What is Iometer?

http://www.electricmonk.org.uk/2012/11/27/iometer/

Iometer is an I/O subsystem measurement and characterization tool for single and clustered systems. It is used as a benchmark and troubleshooting tool and is easily configured to replicate the behaviour of many popular applications. One commonly quoted measurement provided by the tool is IOPS

Iometer can be used for measurement and characterization of:

  • Performance of disk and network controllers.
  • Bandwidth and latency capabilities of buses.
  • Network throughput to attached drives.
  • Shared bus performance.
  • System-level hard drive performance.
  • System-level network performance.

I/O Analyzer (VMware Fling)

http://labs.vmware.com/flings/io-analyzer

VMware I/O Analyzer is a virtual appliance solution, which provides a simple and standardized way of measuring storage performance in VMware vSphere virtualized environments. I/O Analyzer supports two types of workload generator: IOmeter for synthetic workload and trace replay for real-world application workload. It collects both guest level statistics as well as the host level statistics via VMware VI SDK. Standardizing load generation and stats collection increases the confidence of the customer and VMware engineers in the data collected. It also ensures completeness of data collected

Understand and apply LUN masking using PSA-related commands

index

What is LUN Masking?

LUN (Logical Unit Number) Masking is an authorization process that makes a LUN available to some hosts and unavailable to other hosts.LUN Masking is implemented primarily at the HBA (Host Bus Adapter) level. LUN Masking implemented at this level is vulnerable to any attack that compromises the HBA. Some storage controllers also support LUN Masking.

LUN Masking is important because Windows based servers attempt to write volume labels to all available LUN’s. This can render the LUN’s unusable by other operating systems and can result in data loss.

How to MASK on a VMware ESXi Host

  • Step 1: Identifying the volume in question and obtaining the naa ID
  • Step 2: Run the esxcli command to associate/find this naa ID with the vmhba identifiers
  • Step 3: Masking the volume when you want to preserve data from the VMFS volumes for later use or if the volume is already deleted
  • Step 4: Loading the Claim Rules
  • Step 5: Verify that the claimrule has loaded:
  • Step 6: Unclaim the volume in question
  • Step 7: Check Messages
  • Step 8: Unpresent the LUN
  • Step 9: Rescan all hosts
  • Step 10 Restore normal claim rules
  • Step 11: Rescan Datastores

Step 1

  • Check in both places as listed in the table above that you have the correct ID
  • Note: Check every LUN as sometimes VMware calls the same Datastore different LUN Numbers and this will affect your commands later

claim3

  • Example Below

LUN

  • Make a note of the naa ID

Step 2

  • Once you have the naa ID from the above step, run the following command
  • Note we take the : off
  • -L parameter will show a compact list of paths

CLAIM2

  • Example below

lun3

  • We can see there are 2 paths to the LUN called C0:T0:L40 and C0:T1:L40
  • C=Channel, T=Target, L=LUN
  • Next we need to check and see what claim rules exist in order to not use an existing claim rule number
  • esxcli storage core claimrule list
  • Note I had to revert to the vSphere 4 CLI command as I am screenprinting from vSphere 5 not 4!

claimrule

Step 3

  • At this point you should be absolutely clear what LUN number you are using!

claim4

  • Next, you can use any rule numbers for the new claim rule that isn’t in the list above and pretty much anything from 101 upwards
  • In theory I have several paths so i should do this exercise for all of the paths

claim5

Step 4

claim6

  • The Class for those rules will show as file which means that it is loaded in /etc/vmware/esx.conf but it isn’t yet loaded into runtime.

Step 5

claim

  • Run the following command to see those rules displayed twice, once as the file Class and once as the runtime Class

Step 6

claim8

  • Before these paths can be associated with the new plugin (MASK_PATH), they need to be disassociated from the plugin they are currently using. In this case those paths are claimed by the NMP plugin (rule 65535). This next command will unclaim all paths for that device and then reclaim them based on the claimrules in runtime.

claim

Step 7

  • Check Messages

claim9

  • See example below

grep

  • Refresh the Datastore and you should see it vanish from the host view
  • Run the following command to check it now shows no paths
  • esxcfg-mpath -L | grep naa.60050768028080befc00000000000050 again will now show no paths

Step 8

  • Now get your Storage Team to remove the LUN from the SAN

Step 9

  • Rescan all hosts and make sure the Datastore has gone

Step 10

  • To restore normal claimrules, perform these steps for every host that had visibility to the LUN, or from all hosts on which you created rules earlier:

claim10

  • Run esxcli corestorage claimrule load
  • Run esxcli corestorage claimrule list
  • Note that you do not see/should not see the rules that you created earlier.

claimrule

  • Perform a rescan on all ESX hosts that had visibility to the LUN. If all of the hosts are in a cluster, right-click the cluster and click Rescan for Datastores. Previously masked LUNs should now be accessible to the ESX hosts

Step 11

  • Next you may have to follow the following KB Article if you find you have these messages in the logs or you cannot add new LUNs
  • Run the following commands on all HBA Adapters

unclaim

Useful Video of LUN Masking

http://www.youtube.com/watch?feature=player_embedded&v=pyNZkZmTKQQ

Useful VMware Docs (ESXi4)

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1029786

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1015252

Useful VMware Doc (ESXi5)

http://kb.vmware.com/selfservice/microsites/search.do?cmd=displayKC&docType=kc&docTypeID=DT_KB_1_1&externalId=2004605