Archive for January 2013

Identify and tag SSD Devices

SSD

You can use PSA SATP claim rules to tag SSD devices that are not detected automatically.

Only devices that are consumed by the PSA Native Multipathing (NMP) plugin can be tagged.

Procedure

First find all your relevant information

  • Identify the drive to be tagged and its SATP.
  • For example our drive is called (naa.600605b008f362e01c91d3154a908da1):1
  • Type esxcli storage nmp device list -d naa.600605b008f362e01c91d3154a908da1
  • The command results in the following information.

SSDTagging

  • Note down the SATP associated with the device.
  • You can also run the following to get extra information and you can see that Is SSD is marked false when it should be true
  • esxcli storage core device list -d naa.600605b008f362e01c91d3154a908da1

SSDTagging2

Create a new SATP Rule

  • Add a PSA claim rule to mark the device as SSD.
  • There are several ways to do this
  • You can add a claim rule by specifying the device name.
  • esxcli storage nmp satp rule add -s VMW_SATP_CX -d naa.600605b008f362e01c91d3154a908da1 -o enable_ssd

SSDTagging4

  • You can add a claim rule by specifying the vendor name and the model name.
    esxcli storage nmp satp rule add -s VMW_SATP_CX -V vendor_name -M model_name –option=enable_ssd
  • You can add a claim rule based on the transport protocol.
  • esxcli storage nmp satp rule add -s VMW_SATP_CX  –transport transport_protocol –option=enable_ssd
  • You can add a claim rule based on the driver name.
  • esxcli storage nmp satp rule add -s VMW_SATP_CX  –driver driver_name –option=enable_ssd

Restart the host

  • You now need to restart the host

Unclaiming the device

  • You can now unclaim the device by specifying the device name.
  • esxcli storage core claiming unclaim –type device –device naa.600605b008f362e01c91d3154a908da1

SSDTagging5

  • You can unclaim the device by specifying the vendor name and the model name.
  • esxcli storage core claiming unclaim –type device -V vendor_name -M model_name
  • You can unclaim the device based on the transport protocol.
  • esxcli storage core claiming unclaim –type device –transport transport_protocol
  • You can unclaim the device based on the driver name.
  • esxcli storage core claiming unclaim –type device –driver driver_name

Reclaim the device by running the following commands.

  • esxcli storage core claimrule load
  • esxcli storage core claimrule run
  • esxcli storage core claiming reclaim -d naa.600605b008f362e01c91d3154a908da1

Verify if devices are tagged as SSD.

  • esxcli storage core device list -d device_name
  • or
  • esxcli storage core device list -d naa.600605b008f362e01c91d3154a908da1 |grep SSD

SSDTagging6

The command output indicates if a listed device is tagged as SSD.

  • Is SSD: true

What to do next

If the SSD device that you want to tag is shared among multiple hosts, make sure that you tag the device from all the hosts that share the device.

Useful Link

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2013188

Analyse I/O Workloads to determine storage Performance Requirements

What causes Storage Performance issues?

Poor storage performance is generally the result of high I/O latency, but what can cause high storage performance and how to address it? Below are a list of things that can cause poor storage performance

Analysis of storage system workloads is important for a number of reasons. The analysis might be performed to understand the usage patterns of existing storage systems. It is very important for the architects to understand the usage patterns when designing and developing a new, or improving upon the existing design of a storage system. It is also important for a system administrator to understand the usage patterns when configuring and tuning a storage system

  • Under sized storage arrays/devices unable to provide the needed performance
  • I/O Stack Queue congestion
  • I/O Bandwidth saturation, Link/Pipe Saturation
  • Host CPU Saturation
  • Guest Level Driver and Queuing Interactions
  • Incorrectly Tuned Applications

Methods of determining Performance Requirements

There are various tools which can give us insight into how our applications are performing on a virtual infrastructure as listed below

  • vSphere Client Counters
  • esxstop/resxtop
  • vscsistats
  • Iometer
  • I/O Analyzer (VMware Fling)

vSphere Client Counters

The most significant counters to monitor for disk performance are

  • Disk Throughput (Disk Read Rate/Disk Write rate/Disk Usage) Monitored per LUN or per Host
  • Disk Latency (Physical Device Write Latency/Physical Device Write Latency no greater than 15ms and Kernel disk Read Latency/Kernel Disk Write Latency no greater than 4ms
  • Number of commands queued
  • Number of active disk commands
  • Number of aborted disk commands (Disk Command Aborts)

ESXTOP/RESXTOP

The most significant counters to monitor for disk performance are below and can be monitored per HBA

  • READs/s – Number of Disk Reads/s
  • WRITEs/s – Number of Disk Writes/s
  • MBREAD/s – MB read per second
  • MBWRN/s – MB written per second
  • GAVG (Guest Average Latency) total latency as seen from vSphere. GAVG is made up of KAVG and DAVG
  • KAVG (Kernel Average Latency) time an I/O request spent waiting inside the vSphere storage stack. Should be close to 0 but anything greater than 2 ms may be a performance problem
  • QAVG (Queue Average latency) time spent waiting in a queue inside the vSphere Storage Stack.
  • DAVG (Device Average Latency) latency coming from the physical hardware, HBA and Storage device. Should be less than 10
  • ACTV – Number of active I/O Operations
  • QUED – I/O operations waiting to be processed. If this is getting into constant double digits then look carefully as the storage hardware cannot keep up with the host
  • ABRTS – A sign of an overloaded system

stroage2

vscsiStats

Since ESX 3.5, VMware has provided a tool specifically for profiling  storage: vscsiStats.  vscsiStats collects and reports counters on  storage activity.  Its data is collected at the virtual SCSI device  level in the kernel.  This means that results are reported per VMDK (or  RDM) irrespective of the underlying storage protocol.  The following  data are reported in histogram form:

  • IO size
  • Seek distance
  • Outstanding IOs
  • Latency (in microseconds)

vscsiStats Command Options

  • -l – Lists running virtual machines and their world (worldGroupID)
  • -s – Starts vscsiStats data collection
  • -x Stops vscsiStats data collection
  • -p – Prints histogram information ( all, ioLength, seekDistance, outstandingIOs, latency, interarrival)
  • -c – Produces results in a comma-delimted list
  • -h – Displays the hep menu for more info
  • seekDistance is the distance in logical block numbers (LBN) that the disk head must travel to read or write a block. If a concentration of your seek distance is very small (less than 1), then the data is sequential in nature. If the seek distance is varied, your level of randomization may be proportional to this distance traveled
  • interarrival is the amount of time in microseconds between virtual machine disk commands.
  • latency is the time of the I/O trip.
  • ioLength is the size of the I/O. This is useful when you are trying to determine how to layout your disks or how to optimize the performance of the guest O/S and applications running on the virtual machines.
  • outstandingIOs will give you an idea of any queuing that is occurring.

Instructions

I found vscsiStats in the following locations

/usr/sbin

/usr/lib/vmware/bin

  • Determine the world number for your virtual machine
  • Log into an SSH session and type
  • cd /usr
  • cd /sbin
  • vscsiStats -l
  • Record the world ID for the virtual machine you would like to monitor
  • As per example below – 62615

Capture

  • Next capture data for your virtual machine
  • vscsiStats -s -w (worldgroup ID)
  • vscsiStats -s – w 62615
  • Although vscsiStats exits, it is still gathering data

putty

  • Once it has started, it will automatically stop after 30 minutes
  • Type the below command to display histograms for all in a comma-delimited list
  • vscsiStats -p all -c
  • You will see many of these histograms listed

putty3

  • Type the following to show the latency histogram
  • vscsiStats -p latency

putty2

  • You can also run vscsiStats and output to a file
  • vscsiStats -p latency > /tmp/vscsioutputfile.txt
  • To manually stop the data collection and reset the counters, type the following command
  • vscsStats -x -w 62615
  • To reset all counters  to zero, run
  • vscsiStats -r

Iometer

What is Iometer?

http://www.electricmonk.org.uk/2012/11/27/iometer/

Iometer is an I/O subsystem measurement and characterization tool for single and clustered systems. It is used as a benchmark and troubleshooting tool and is easily configured to replicate the behaviour of many popular applications. One commonly quoted measurement provided by the tool is IOPS

Iometer can be used for measurement and characterization of:

  • Performance of disk and network controllers.
  • Bandwidth and latency capabilities of buses.
  • Network throughput to attached drives.
  • Shared bus performance.
  • System-level hard drive performance.
  • System-level network performance.

I/O Analyzer (VMware Fling)

http://labs.vmware.com/flings/io-analyzer

VMware I/O Analyzer is a virtual appliance solution, which provides a simple and standardized way of measuring storage performance in VMware vSphere virtualized environments. I/O Analyzer supports two types of workload generator: IOmeter for synthetic workload and trace replay for real-world application workload. It collects both guest level statistics as well as the host level statistics via VMware VI SDK. Standardizing load generation and stats collection increases the confidence of the customer and VMware engineers in the data collected. It also ensures completeness of data collected