Archive for June 2012

Key Windows Performance Counters, Info and Limits

Key Windows Performance Counters, Info and Limits

Counter

Description

What to watch for

Logical Disk\% Free Space Measures the percentage of free space of the selected Logical Disk If it is below 15% then you run the risk of running out of space to store critical O/S files
PhysicalDisk\Idle Time Measures the percentage of time the disk was idle during the sample interval If this value falls below 20% the disk system is said to be saturated and you should install a faster disk system
PhysicalDisk\Avg. Disk Sec/Read Measures the average time in seconds to read data from the disk If this value is larger than 25 milliseconds the disk system is experiencing latencyFor SQL and Exchange the threshold is lower – 10ms
PhysicalDisk\Avg. Disk Sec/Write Measures the average time in seconds to write data from the disk If this value is larger than 25 milliseconds the disk system is experiencing latencyFor SQL and Exchange the threshold is lower – 10ms
Physical Disk\Avg Queue Length How many I/O Operations are waiting for the Hard Drive to become available If the value of the counter is larger than twice the number of disk spindles in an array then the disk may be a bottleneck
Memory\Cache Bytes Indicates the amount of memory being used for the file system cache. There will be a bottleneck if the value is greater than 300MB
Processor\%Idle Time % Idle Time is the percentage of time the processor is idle during the sample interval Below 20% and you are running at CPU saturation if this prolonged
Processor\Interrupts/sec The numbers of interrupts the processor was asked to respond to. Interrupts are generated from hardware components like hard disk controller adapters and network interface cards. A sustained value over 1000 is usually an indication of a problem. Problems would include a poorly configured drivers, errors in drivers, excessive utilization of a device (like a NIC on an IIS server), or hardware failure
Processor\%Processor Time Measures  how much time the processor actually spends working on productive threads and how often it was busy servicing requests. It actually provides a measurement of how often the system is doing nothing subtracted from 100%. This is a simpler calculation for the processor to make. The processor can never be sitting idle waiting to the next task, unlike our cashier. The CPU must always have something to do. It’s like when you turn on the computer, the CPU is a piece of wire that electric current is always running through, thus it must always be doing something. NT give the CPU something to do when there is nothing else waiting in the queue. This is called the idle thread. The system can easily measure how often the idle thread is running as opposed to having to tally the run time of each of the other process threads. Then , the counter simply subtracts the percentage from 100%. This counter is a natural choice that will give use the amount of time that this particular process spends using the processor resource.
Memory\Page Faults/sec This counter gives a general idea of how many times information being requested is not where the application (and VMM) expects it to be. The information must either be retrieved from another location in memory or from the pagefile. While a sustained value may indicate trouble here, you should be more concerned with hard page faults that represent actual reads or writes to the disk. Remember that the disk access is much slower than RAM
Memory\%Committed Bytes in use This counter indicates the total amount of memory that has been committed for the exclusive use of any of the services or processes on Windows NT. Should this value approach the committed limit, you will be facing a memory shortage of unknown cause, but of certain severe consequence.
Memory\Available Bytes This counter indicates the amount of memory that is left after nonpaged pool allocations, paged pool allocations, process’ working sets, and the file system cache have all taken their piece.
System\System Calls/sec This counter is a measure of the number of calls made to the system components, Kernel mode services. This is a measure of how busy the system is taking care of applications and services—software stuff. When compared to the Interrupts/Sec it will give you an indication of whether processor issues are hardware or software related. See Processor : Interrupts/Sec for more information
System\Threads Threads is the number of threads in the computer at the time of data collection. This is an instantaneous count, not an average over the time interval.  A thread is the basic executable entity that can execute instructions in a processor. Monitor loosely
System\Processor Queue Length Gives an indication of how many threads are waiting for execution. If this counter is consistently higher than around 5 when processor utilization approaches 100%, then this is a good indication that there is more work (active threads) available (ready for execution) than the machine’s processors are able to handle. Note that this is not always a hard and fast indicator however, for some services like IIS 6 pool and manage their own worker threads, so on a busy web server for example you would want to look at other counters like ASP\Requests Queued or ASP.NET\Requests Queued as well. Furthermore, the larger the number of active services and applications running on your server, the busier the processor queue will normally be, so on a multi-role server running near 100% utilization content may only be a significant factor once System\Processor Queue Length exceeds something like 10 instead of 5 as mentioned previously.
Network Interface : Bytes Sent/sec This is how many bytes of data are sent to the NIC. This is a raw measure of throughput for the network interface. We are really measuring the information sent to the interface which is the lowest point we can measure. If you have multiple NIC, you will see multiple instances of this particular counter. Dependent on NIC Speed
Network Interface: Bytes Received/sec. This, of course, is how many bytes you get from the NIC. This is a measure of the inbound traffic In measuring the bytes, NT isn’t too particular at this level. So, no matter what the byte is, it is counted. This will include the framing bytes as opposed to just the data Dependent on NIC Speed

 

Storage/Datastore Reclamation in VMware

Sometimes, it is worth doing a storage reclamation exercise through all your VMware Datastores in order to remove old folder, files and to check that nothing miscellaneous is going on.

What can you find?

In vCenter > Datastores > Performance Tab, you can find the graph showing all the files it can detect with the selection “Other VM Files” OR “Other” which is what we’re interested in.

When we checked this out on the Host back-end logged in via Putty, we can see the below. The ./ files are not usual to find on LUNs/Datstores and indicate that there are SAN snapshots existing on here

/vmfs/volumes/4e0da454-902c23bf-cb36-e61f13f7c69b # ls -l

SERVER01
SERVER02
SERVER03

/vmfs/volumes/4e0da454-902c23bf-cb36-e61f13f7c69b # find . -exec ls -lh {} \; | grep flat

SERVER01-flat.vmdk
SERVER01_1-flat.vmdk
SERVER01_2-flat.vmdk
SERVER01_3-flat.vmdk

./SERVER01/SERVER01_3-flat.vmdk
./SERVER01/SERVER01_2-flat.vmdk
./SERVER01/SERVER01_1-flat.vmdk
./SERVER01/SERVER01-flat.vmdk

Conclusion

You will need to ask your Storage Admin to check out your LUNs and make sure that any old snapshots are either required or can be deleted.

It is worth keeping an eye on all of this as we found we had nearly 2TB of LUN Snapshots lurking around taking up valuable and expensive storage space.