Archive for Certification

Fault Tolerance

What is Fault Tolerance?

FT is the evolution of continuous availability that utilises VMware vLockstep technology to keep a primary and secondary virtual machine in sync. It is based on the record/playback technology used in VMware Workstation. It streams non-deterministic events and then replay will occur deterministically. This means it matches instruction for instruction and memory for memory to create identical processing

Deterministic means that the processor will execute the same instruction set on the secondary VM

Non-Deterministic means event functions such as network/disk/mouse and keyboard including hardware interrupts which are also played back

FT1

The Primary and Secondary VMs continuously exchange heartbeats. This exchange allows the virtual machine pair to monitor the status of one another to ensure that Fault Tolerance is continually maintained. A transparent failover occurs if the host running the Primary VM fails, in which case the Secondary VM is immediately activated to replace the Primary VM. A new Secondary VM is started and Fault Tolerance redundancy is reestablished within a few seconds. If the host running the Secondary VM fails, it is also immediately replaced. In either case, users experience no interruption in service and no loss of data

Fault Tolerance avoids “split-brain” situations, which can lead to two active copies of a virtual machine after recovery from a failure. Atomic file locking on shared storage is used to coordinate failover so that only one side continues running as the Primary VM and a new Secondary VM is respawned automatically.

Use Cases

  • Applications that need to be available at all times, especially those that have long-lasting client connections that users want to maintain during hardware failure.
  • Custom applications that have no other way of doing clustering.
  • Cases where high availability might be provided through custom clustering solutions, which are too complicated to configure and maintain.
  • On demand protection for VMs running end of month reports or financials

Best Practices for Fault Tolerance

To ensure optimal Fault Tolerance results, VMware recommends that you follow certain best practices. In addition to the following information, see the white paper VMware Fault Tolerance Recommendations and Considerations at http://www.vmware.com/resources/techresources/10040

Requirements for FT

  • Cluster Requirements
  • Host Requirements
  • VM Requirements

Cluster Requirements

  • Host certificate checking must be enabled. Default for vSphere 4.1 but you may need to enable this (vCenter Server Settings > SSL Settings > Select the vCenter requires verified host SSL certificates)
  • The cluster must have at least 2 ESXi hosts running the same FT Version or build number
  • HA must be enabled on the cluster
  • EVC must be enabled if you want to use FT in conjunction with DRS or DRS will be disabled

Hosts Requirements

  • The ESXi hosts must have access to the same datastores and networks
  • The ESXi hosts must have a FT Logging network setup
  • The FT Logging network must have at least 1GB connectivity
  • NICs can be shared if necessary
  • The ESXi hosts CPUs must be FT compatible
  • Host must be licensed for FT
  • Hardware Virtualisation must be enabled on the BIOS of the hosts to enable CPU support for FT
  • It is recommended that Power Management is turned off in the BIOS. This helps ensure uniformity in the CPU speeds

VMs Requirements

  • Only VMs with a single CPU are supported
  • VMs must be running a supported O/S
  • VMs must be stored on shared storage available to all hosts
  • FC, iSCSI, FCOE and NFS are supported
  • A VMs disk must be eager zeroedthick format or a Virtual RDM (Physical RDMs are not supported)
  • No VM snapshots
  • The VM must not be a linked clone
  • No USB, Sound devices, serial ports or parallel ports configured
  • The VM cannot use NPIV
  • Nested Page Tables/Extended Page Tables are not supported
  • The VM cannot use NIC Passthrough
  • The VM cannot use the older vlance drivers
  • No CD-ROM or floppy devices attached
  • The VM cannot use a paravirtualised kernel
  • VMs must be on the correct Monitor Mode

monitormode

Caveats

  • You can use vMotion but not Storage vMotion and therefore Storage sDRS
  • Hot Plugging is not allowed
  • You cannot change the network settings while the VM is on
  • Because snapshots are not supported, you will not be able to use any backup mechanism that uses snapshots. You can disable FT first before backing up

Configure FT Networking for Host Machines

On each host that you want to add to a vSphere HA cluster, you must configure two different networking switches so that the host can also support vSphere Fault Tolerance.
To enable Fault Tolerance for a host, you must complete this procedure twice, once for each port group option to ensure that sufficient bandwidth is available for Fault Tolerance logging. Select one option, finish this procedure, and repeat the procedure a second time, selecting the other port group option.

Prerequisites

  • Multiple gigabit Network Interface Cards (NICs) are required. For each host supporting Fault Tolerance, you need a minimum of two physical gigabit NICs. For example, you need one dedicated to Fault Tolerance logging and one dedicated to vMotion.
  • VMware recommends three or more NICs to ensure availability.
  • The vMotion and FT logging NICs must be on different subnets
  • IPv6 is not supported on the FT logging NIC.

Procedure

  • Connect vSphere Client to vCenter Server.
  • In the vCenter Server inventory, select the host and click the Configuration tab.
  • Select Networking under Hardware, and click the Add Networking link
  • The Add Network wizard appears.
  • Select VMkernel under Connection Types and click Next.
  • Select Create a virtual switch and click Next.
  • Provide a label for the switch.
  • Select either Use this port group for vMotion or Use this port group for Fault Tolerance logging and click Next.
  • Provide an IP address and subnet mask and click Next.

ftlogging

  • Click Finish.

Networking Example

vMotion and FT Logging can share the same VLAN (configure the same VLAN number in both port groups), but require their own unique IP addresses residing in different IP subnets. However, separate VLANs might be preferred if Quality of Service (QoS) restrictions are in effect on the physical network with VLAN based QoS. QoS is of particular use where competing traffic comes into play, for example, where multiple physical switch hops are used or when a failover occurs and multiple traffic types compete for network resources.

This example uses four port groups configured as follows:

  • VLAN A: Virtual Machine Network Port Group-active on vmnic2 (to physical switch #1); standby on vmnic0 (to physical switch #2.)
  • VLAN B: Management Network Port Group-active on vmnic0 (to physical switch #2); standby on vmnic2 (to physical switch #1.)
  • VLAN C: vMotion Port Group-active on vmnic1 (to physical switch #2); standby on vmnic3 (to physical switch #1.)
  • VLAN D: FT Logging Port Group-active on vmnic3 (to physical switch #1); standby on vmnic1 (to physical switch #2.)

FT3

Instructions for setup

  • Connect to vCenter using the vClient or Web Client
  • Right click the VM you want to use for FT and select Fault Tolerance > Turn on Fault Tolerance

FT4

  • You will get a message as per below

ft5

vSphere Fault Tolerance Configuration Recommendations

VMware recommends that you observe certain guidelines when configuring Fault Tolerance.

  • In addition to non-fault tolerant virtual machines, you should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESXi host and virtual machines, all of which can vary.
  • If you are using NFS to access shared storage, use dedicated NAS hardware with at least a 1Gbit NIC to obtain the network performance required for Fault Tolerance to work properly.
  • Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. The memory reservation of a fault tolerant virtual machine is set to the virtual machine’s memory size when Fault Tolerance is turned on. Without this excess in the resource pool, there might not be any memory available to use as overhead memory.
  • Use a maximum of 16 virtual disks per fault tolerant virtual machine.
  • To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.

Analyze HA cluster capacity to determine optimum cluster size

Capture

Cluster Capacity Recommendations

  • How many hosts do you have
  • How many VMs do you have
  • Do you have reservations on your VMs
  • Remember the limit of 32 hosts per cluster
  • Do you have Oracle clustering considerations (CPU Licensing)?
  • Do you have to have separate clusters because of large VMs?
  • By pooling together hosts into larger clusters, DRS is far more efficient at VM placement and providing resource management. It also allows for more efficient HA policy management since the absorption of spare capacity needed for infrequent host failures is now spread out over a larger set of hosts
  • Check all hosts can see shared storage
  • Check your networking. Inconsistency in the network configuration may result in virtual machines losing network connectivity after redistribution, in the event they are moved to a physical host lacking the required VLAN.
  • It’s critical to ensure there is complete network redundancy in all paths between hosts in the cluster. The greatest risk with VMware HA implementations is a false isolation event where an ESX host is incorrectly identified as being offline, triggering an isolation response.
  • DRS load balancing is not instantaneous, so balancing virtual machines with rapidly oscillating load with more consistent VMware HA and DRS Capacity Planning workload characteristics will likely require a larger cluster to ensure a more rapid redistribution response.
  • There are also performance considerations as to the number of hosts that should concurrently access a single LUN, as well as storage IO considerations. Too many hosts with concurrent access to a single LUN can inflict a performance penalty due to LUN-level SCSI reservations associated with virtual machine file operations and LUN metadata updates.
  • Although an HA cluster can contain up to sixteen hosts, recommended cluster sizes generally fall somewhere between eight and twelve, based on the individual organization’s needs.
  • VMware ESX host hardware in the cluster should be as homogeneous as possible, specifically in terms of memory capacity, CPU clock speed, and core count. Because DRS relies on VMotion to migrate running workloads, all hosts must be VMotion compatible
  • It is best to have hosts of the same spec so you don’t create a skewed HA Failover situation.

Analyse performance metrics to calculate host failure requirements

images

Performance Metrics to use

  • Use the inbuilt vCenter Performance charts

The vCenter Performance charts have 3 advanced settings which allow you to see Effective CPU Resources, Effective Memory Resources and Current Failover level

ClusterPerf

ClusterPerf2

ClusterPerf3

  • Use ESXTOP/RESXTOP

ESXTOP and RESXTOP can be used in batch mode to monitor hosts over a day maximum. You can monitor all statistics or select the counters you want to monitor. The resulting csv file can then be imported into EXCEL, ESXPLOT or Perfmon on a Windows server to analyse the results. From these, you can probably see if there are any resources that may restrict the choice of cluster setting you apply. If you are low on resources then selecting a Specify Failover Host setting is probably not the best idea as wasting a host as a hot standby when you are already contrained for resources will take even more resource away

Analyze vSphere environment to determine appropriate HA admission control policy

index

Choosing an Admission Control Policy

You should choose a vSphere HA admission control policy based on your availability needs and the characteristics of your cluster. When choosing an admission control policy, you should consider a number of factors.

  • Avoiding Resource Fragmentation

Resource fragmentation occurs when there are enough resources in aggregate for a virtual machine to be failed over. However, those resources are located on multiple hosts and are unusable because a virtual machine can run on one ESXi host at a time. The Host Failures Cluster Tolerates policy avoids resource fragmentation by defining a slot as the maximum virtual machine reservation. The Percentage of Cluster Resources policy does not address the problem of resource fragmentation. With the Specify Failover Hosts policy, resources are not fragmented because hosts are reserved for failover.

  • Flexibility of Failover Resource Reservation

Admission control policies differ in the granularity of control they give you when reserving cluster resources for failover protection. The Host Failures Cluster Tolerates policy allows you to set the failover level as a number of hosts. The Percentage of Cluster Resources policy allows you to designate up to 100% of cluster CPU or memory resources for failover. The Specify Failover Hosts policy allows you to specify a set of failover hosts.

  • Heterogeneity of Cluster

Clusters can be heterogeneous in terms of virtual machine resource reservations and host total resource capacities. In a heterogeneous cluster, the Host Failures Cluster Tolerates policy can be too conservative because it only considers the largest virtual machine reservations when defining slot size and assumes the largest hosts fail when computing the Current Failover Capacity. The other two admission control policies are not affected by cluster heterogeneity.

Understand interactions between DRS and HA

index

Using vSphere HA and DRS Together

Using vSphere HA with Distributed Resource Scheduler (DRS) combines automatic failover with load balancing. This combination can result in a more balanced cluster after vSphere HA has moved virtual machines to different hosts.
When vSphere HA performs failover and restarts virtual machines on different hosts, its first priority is the immediate availability of all virtual machines. After the virtual machines have been restarted, those hosts on which they were powered on might be heavily loaded, while other hosts are comparatively lightly loaded.
vSphere HA uses the virtual machine’s CPU and memory reservation to determine if a host has enough spare capacity to accommodate the virtual machine.

In a cluster using DRS and vSphere HA with admission control turned on, virtual machines might not be evacuated from hosts entering maintenance mode. This behavior occurs because of the resources reserved for restarting virtual machines in the event of a failure. You must manually migrate the virtual machines off of the hosts using vMotion.
In some scenarios, vSphere HA might not be able to fail over virtual machines because of resource constraints. This can occur for several reasons.

  • HA admission control is disabled and Distributed Power Management (DPM) is enabled. This can result in DPM consolidating virtual machines onto fewer hosts and placing the empty hosts in standby mode leaving insufficient powered-on capacity to perform a failover.
  • VM-Host affinity (required) rules might limit the hosts on which certain virtual machines can be placed.
  • There might be sufficient aggregate resources but these can be fragmented across multiple hosts so that they can not be used by virtual machines for failover.

In such cases, vSphere HA can use DRS to try to adjust the cluster (for example, by bringing hosts out of standby mode or migrating virtual machines to defragment the cluster resources) so that HA can perform the failovers.
If DPM is in manual mode, you might need to confirm host power-on recommendations. Similarly, if DRS is in manual mode, you might need to confirm migration recommendations. If you are using VM-Host affinity rules that are required, be aware that these rules cannot be violated. vSphere HA does not perform a failover if doing so would violate such a rule.

Configure HA Related alarms and monitor a HA Cluster

images

Monitoring a cluster

  • Highlight the cluster
  • Select Summary
  • You will see the following page

ha3

  • Click Advanced Runtime. This tells you the current state of your cluster including the slot information and host information

ha4

  • Click on Cluster Operational Status to see informational and warning messages associated with your cluster

ha5

  • Click Cluster Status and tab through the below options

ha1

ha2

ha3

Configuring Alarms

You may setup custom alerts of your own to monitor the health of your HA cluster. However, VMware provides a number of default alerts for you described below. These alarms may be updated to take custom actions. To access these settings, connect to the vCenter server using the vSphere client:

  1. Enter the Host and Clusters view (Ctrl + Shift + H)
  2. Highlight the vCenter Server
  3. Click on the Alarms tab
  4. Select the Definitions view
  5. Select one of the below described HA default alarms
  6. Right click the alarm and select Edit Settings
  7. Click the Actions tab

ha6

Default System Actions

  1. Send a Notification email
  2. Send a Notification trap
  3. Run a command
  4. Power on VM
  5. Power off VM
  6. Suspend VM
  7. Reset VM
  8. Migrate VM
  9. Reboot Guest VM
  10. Shutdown Guest VM

To Configure a Sender Email Account for the vCenter Server

  1. Enter the Server Settings dialog (Ctrl + Shift + I)
  2. Click on Mail
  3. Enter an SMTP Server
  4. Enter a Sender Account
  5. Note. Additional settings will be required to allow vCenter to use a pre-defined account to send emails. Check with your third party email server documentation when setting up this account.

To Configure SNMP

  1. Enter the Server Settings dialog (Ctrl + Shift + I)
  2. Click on SNMP
  3. Enter up to four different Receiver URLs and Community Strings for your SNMP environment

Configure customised isolation response settings

HA Isolation Responses

As seen in the below diagram, when HA detects a failure on one of the hosts, a response is triggered to deal with the Virtual Machines on that host

ha

Host Isolation Responses

First of all we need to look at the Host Monitoring which is a selectable box within the HA Settings

Host Monitoring

The restarting by VMware HA of virtual machines on other hosts in the cluster in the event of a host isolation or host failure is dependent on the “host monitoring” setting. If host monitoring is disabled, the restart of virtual machines on other hosts following a host failure or isolation is also disabled. Disabling host monitoring also impacts VMware Fault Tolerance because it controls whether HA will restart a Fault Tolerance (FT) secondary virtual machine after an event. Essentially a host will always perform the programmed host isolation response when it determines it is isolated. The host monitoring setting determines if virtual machines will be restarted elsewhere following this event.

iso2

Isolation  Responses

When an isolation response is triggered, the isolated host must determine whether it must take any action based upon the configuration settings for the isolation response for each virtual machine that is powered on. The isolation response setting provides a means to dictate the action desired for the powered-on virtual machines maintained by a host when that host is declared isolated. There are three possible isolation response values that can be configured and applied to a cluster or individually to a specific virtual machine.These are

  • Leave Powered On
  • Power Off
  • Shut Down

isolation

Leave Powered On

With this option, virtual machines hosted on an isolated host are left powered on. In situations where a host loses all management network access, it might still have the ability to access the storage subsystem and the virtual machine network. Selecting this option enables the virtual machine to continue to function if this were to occur. This is now the default isolation response setting in vSphere High Availability 5.0.

Power Off
When this isolation response option is used, the virtual machines on the isolated host are immediately stopped. This is similar to removing the power from a physical host. This can induce inconsistency with the file system of the OS used in the virtual machine. The advantage of this action is that VMware HA will attempt to restart the virtual machine more quickly than when using the third option.

Shut Down

Through the use of the VM Tools package installed within the guest operating system of a virtual machine, this option attempts to gracefully shut down the operating system with the virtual machine before powering off the virtual machine. This is more desirable than using the Power Off option because it provides the OS with time to commit any outstanding I/O activity to disk. HA will wait for a default of 300 seconds (5 minutes) for this graceful shutdown to occur. If the OS is not gracefully shut down by this time, it will initiate a power-off of the virtual machine. Changing the das.isolationshutdowntimeout attribute will modify this timeout if it is determined that more time is required to gracefully shut down an OS. The Shut Down option requires that the VM Tools package be installed in the guest OS. Otherwise, it is equivalent to the Power Off setting.

Best Practices

From a best practices perspective, Leave Powered On is the recommended isolation response setting for the majority of environments. Isolated hosts are a rare event in a properly architected environment, given the redundancy built in. In environments that use network-based storage protocols, such as iSCSI and NFS, the recommended isolation response is Power Off. With these environments, it is highly likely that a network outage that causes a host to become isolated will also affect the host’s ability to communicate to the datastores.
An isolated host will initiate the configured isolation response for a running virtual machine if either of the following is true

  • The host lost access to the datastore containing the configuration (.vmx) file for the virtual machine
  • The host still has access to the datastore and it determined that a master is responsible for the virtual machine.

To determine this, the isolated host checks for the accessibility of the “home datastore” for each virtual machine and whether the virtual machines on that datastore are “owned” by a master, which is indicated by a master’s having exclusively locked a key file that HA maintains on the datastore. After declaring itself as being isolated, the isolated host releases any locks it might have held on any datastores. It then checks periodically to see whether a master has obtained a lock on the datastore. After a lock is observed on the datastore by the isolated host, the HA agent on the isolated host applies the configured isolation response. Ensuring that a virtual machine is under continuous protection by a master provides an additional layer of protection. Because only one master can lock a datastore at a given time, this significantly reduces chances of “split-brain” scenarios. This also protects against situations where a complete loss of the management networks without a complete loss of access to storage would make all the hosts in a cluster determine they were isolated.
In certain environments, it is possible for a loss of the management network to also affect access to the heartbeat datastores. This is the case when the heartbeat datastores are hosted via NFS that is tied to the management network in some manner. In the event of a complete loss of connectivity to the management network and the heartbeat datastores, the isolation response activity resembles that observed in vSphere 4.x.
In this configuration, the isolation response should be set to Power Off so another virtual machine with access to the network can attempt to power on the virtual machine.
There is a situation where the isolation response will likely take an extended period of time to transpire. This occurs when all paths to storage are disconnected, referred to as an all-paths-down (APD) state, and the APD condition does not impact all of the datastores mounted on the host. This is due to the fact that there might be outstanding write requests to the storage subsystem that must time out. Establishing redundant paths to the storage subsystem will help prevent an APD situation and this issue.

Configure HA Redundancy

HA in vSphere 5

The way HA works in vSphere 5 is quite different to the way it worked in vSphere 4. vSphere HA now uses a new tool called FDM (Fault Domain Manager) which has been developed to replace AAM (Automated Availability Manager) AAM had limitations in reliance on name resolution and scalability limits. Improvements such as

  • FDM uses a Master/Slave architecture which does not rely on Primary/Secondary host designations
  • FDM uses both the management network and storage devices for communications
  • FDM introduces support for IPv6
  • FDM addresses the issues of network partition and network isolation

How it works

  • When vSphere HA is enabled, the vSphere HA agents enter an election to pick a vSphere HA Master.
  • The vSphere HA master monitors slave hosts and will restart VMs in the event of a failover
  • The vSphere HA master monitors the power state of all protected machines and if the VM fails, it will be restarted
  • The vSphere HA master manages the list of hosts that are members of the cluster and manages the adding/removing of hosts into a cluster
  • The vSphere HA master manages the list of protected VMs
  • The vSphere HA master caches the cluster configuration. The master notifies and informs slave hosts of changes in the cluster
  • The vSphere HA master sends heartbeat messages to the slave hosts so the slaves know the master is still alive
  • The vSphere HA master reports state information to vCenter (Only the master does this)

HA  Process

  • The hosts within an HA cluster constantly heartbeat with the host designated as the master over the management network. The first step in determining whether a host is isolated is detecting a lack of these heartbeats.
  • After a host stops communicating with the master, the master attempts to determine the cause of the issue.
  • Using heartbeat datastores, the master can distinguish whether the host is still alive by determining if the affected host is maintaining heartbeats to the heartbeat datastores. This enables the master to differentiate between a management network failure, a dead host and a partitioned/isolated situation.
  • The time elapsed before the host declares itself isolated varies depending on the role of the host (master or slave) at the time of the loss of heartbeats.
  • If the host was a master, it will declare itself isolated within 5 seconds.
  • If the host was a slave, it will declare itself isolated in 30 seconds.
  • The difference in time is due to the fact that if the host was a slave, it then must go through an election process to identify whether any other hosts exist or if the master host simply died. This election process starts for the slave at 10 seconds after the loss of heartbeats is detected.
  • If the host sees no response from another host during the election for 15 seconds, the HA agent on a host then elects itself as a master, checks whether it is isolated and, if so, drops into a startup state.
  • In short, a host will begin to check to see whether it is isolated whenever it is a master in a cluster with more than one other host and has no slaves. It will continue to do so until it becomes a master with a slave or connects to a master as a slave.
  • At this point, the host will attempt to ping its configured isolation addresses to determine the viability of the network. The default isolation address is the gateway specified for the management network.
  • Advanced settings can be used to modify the isolation addresses used for your particular environment. The option das. isolationaddress[X] (where X is 1–10) is used to configure multiple isolation addresses.
  • Additionally das. usedefaultisolationaddress is used to indicate whether the default isolation address (the default gateway) should be used to determine if the host is network isolated. If the default gateway is not able to receive ICMP ping packets, you must set this option to “false.”
  • It is recommended to set one isolation address for each management network used, keeping in mind that the management network links should be redundant, as previously mentioned.
  • The isolation address used should always be reachable by the host under normal situations, because after 5 seconds have elapsed with no response from the isolation addresses, the host then declares itself isolated.
  • After this occurs, it will attempt to inform the master of its isolated state by use of the heartbeat datastores.

What does HA use?

  • Management Network
  • Datastore Heartbeats

Management Network

Ideally the Management network should be set up as a fully redundant network team at the adapter level or at the Management network level. It can either be setup as Etherchannel with Route based on IP Hash or in an Active/Standby configuration allowing for failover should one network card fail

In the event that the vSphere HA Master cannot communicate with a slave through the management network isolation address, it can then check its Heartbeat Datastores to see if the host is still up and running. This helps vSphere HA deal with Network Partitioning and Network Isolation

Switch Setup

Requirements:

  • Two physical network adaptors
  • VLAN trunking
  • Two physical switches

The vSwitch should be configured as follows:

  • Load balancing = route based on the originating virtual port ID (default)
  • Failback = no
  • vSwitch0: Two physical network adaptors (for example: vmnic0 and vmnic2)
  • Two port groups (for example, vMotion and management)

In this example, the management network runs on vSwitch0 as active on vmnic0 and as standby on vmnic2. The vMotion network runs on vSwitch0 as active on vmnic2 and as standby on vmnic0
Each port group has a VLAN ID assigned and runs dedicated on its own physical network adaptor. Only in the case of a failure is it switched over to the standby network adaptor. Failback is set to “no” because in the case of physical switch failure and restart, ESXi might falsely recognize that the switch is back online when its ports first come online. In reality, the switch might not be forwarding on any packets until it is fully online. However, when failback is set to “no” and an issue arises, both your management network and vMotion network will be running on the same network adaptor and will continue running until you manually intervene.

Network Partitioning

This is what happens when one or more of the slaves cannot communicate with the vSphere HA Master even though they still have network connectivity. Checking the Datastore heartbeats is then used to see whether the slave hosts are alive

When a management network failure occurs for a vSphere HA cluster, a subset of the cluster’s hosts might be unable to communicate over the management network with the other hosts. Multiple partitions can occur in a cluster.
A partitioned cluster leads to degraded virtual machine protection and cluster management functionality. Correct the partitioned cluster as soon as possible.

Virtual machine protection. vCenter Server allows a virtual machine to be powered on, but it is protected only if it is running in the same partition as the master host that is responsible for it. The master host must be communicating with vCenter Server. A master host is responsible for a virtual machine if it has exclusively locked a system-defined file on the datastore that contains the virtual machine’s configuration file.

Cluster management. vCenter Server can communicate with only some of the hosts in the cluster, and it can connect to only one master host. As a result, changes in configuration that affect vSphere HA might not take effect until after the partition is resolved. This failure could result in one of the partitions operating under the old configuration.

Advanced Cluster Settings which can be used

  • das.isolationaddressx Used to configure multiple isolation addresses.
  • das.usedefaultisolationaddress Set to true/false and used in the cse where a default gateway is not pingable, in which case this set to false in conjunction with configuring another address for das.isolationaddress
  • das.failuredetectiontime. Increase to 30 seconds (30000) to decrease the likelyhood of a false positive

NOTE: If you change the value of any of the following advanced attributes, you must disable and then re-enable vSphere HA before your changes take effect.

Network Isolation

This is where one or more slave hosts have lost all management network connectivity. Isolated hosts can’t communicate with the master or other slaves. The slave host uses the heartbeat datastore to notify the master that it is isolated via special binary file called host-X-poweron file.

Datastore Heartbeating

By default, vCenter will automatically select two datastores to use for storage heartbeats. An algorithm designed to maximize availability and redundancy of the storage heartbeats selects these datastores. This algorithm attempts to select datastores that are connected to the highest number of hosts. It also attempts to select datastores that are hosted on different storage arrays/NFS servers. A preference is given to VMware vSphere VMFS–formatted datastores, although NFS-hosted datastores can also be used.

  • Highlight your cluster
  • Click Edit Settings
  • Select Datastore Heartbeating

Datastore Heartbeat

  • Select only from my preferred datastores restricts HA to using only those selected from the list
  • Select any of the cluster datastores disables the selection of datastores from the list. Any cluster datastore can be used by HA for heartbeating
  • Select any of the cluster datastores taking into account my preferences is a mix of the previous 2 options. The Admin selects the preferred datastores that HA should use. vSphere selects which ones to use from these. If any become unavailable, HA will choose another from the list.
  • The vSphere HA Cluster Status box will show you which datastores are being used

Datastore Heartbeat2

VMware Availability Guide

http://pubs.vmware.com

vSphere High Availability Deployment Best Practices

http://www.vmware.com/files/pdf/techpaper/vmw-vsphere-high-availability.pdf

 

Create DRS and DPM Alarms

images

DRS Alarms

  • Right click the Datacenter or Cluster and Select Alarm > Add New Alarm
  • Select Clusters

DRSALARM

  • Click Triggers  > Add

drsalarm2

  • You can click Advanced and tune the alarm even more

drs4

  • Click Reporting

DRSREP

  • Click Actions and choose how to be notified

DRSACTION

DPM Alarms

You can use event-based alarms in vCenter Server to monitor vSphere DPM.
The most serious potential error you face when using vSphere DPM is the failure of a host to exit standby mode when its capacity is needed by the DRS cluster. You can monitor for instances when this error occurs by using the preconfigured Exit Standby Error alarm in vCenter Server. If vSphere DPM cannot bring a host out of standby mode (vCenter Server event DrsExitStandbyModeFailedEvent), you can configure this alarm to send an alert email to the administrator or to send notification using an SNMP trap. By default, this alarm is cleared after vCenter Server is able to successfully connect to that host.

To monitor vSphere DPM activity, you can also create alarms for the following vCenter Server events.

  • Entering Standby mode (about to power off host) DrsEnteringStandbyModeEvent
  • Successfully entered Standby mode (host power off succeeded) DrsEnteredStandbyModeEvent
  • Exiting Standby mode (about to power on the host) DrsExitingStandbyModeEvent
  • Successfully exited Standby mode (power on succeeded) DrsExitedStandbyModeEvent

Alarm setup Instructions

  • Click the Datacenter or Cluster and Select Alarms
  • Click Definitions
  • Double click on Exit Standby Error
  • Select Hosts for Alarm Type

dpm1

  • Select Triggers

DPM

  • Select Reporting

DRSREP

  • Select Actions

dpm2

  • Right click the Datacenter or Cluster and Select Alarm > Add New Alarm
  • Select Datastore Clusters

sdrs1

  • Select Triggers

sdrs2

  • Select Reporting

DRSREP

  • Select Action

DRSACTION

Great Alarm Link from the Communities

http://communities.vmware.com/servlet/JiveServlet/download/12145-1-35516/vSphere%20Alarms%20v2.xlsx

DPM Explained

index

What is DPM?

The vSphere Distributed Power Management (DPM) feature allows a DRS cluster to reduce its power consumption by powering hosts on and off based on cluster resource utilization.
vSphere DPM monitors the cumulative demand of all virtual machines in the cluster for memory and CPU resources and compares this to the total available resource capacity of all hosts in the cluster. If sufficient excess capacity is found, vSphere DPM places one or more hosts in standby mode and powers them off after migrating their virtual machines to other hosts. Conversely, when capacity is deemed to be inadequate, DRS brings hosts out of standby mode (powers them on) and uses vMotion to migrate virtual machines to them. When making these calculations, vSphere DPM considers not only current demand, but it also honors any user-specified virtual machine resource reservations.

ESXi hosts cannot automatically be brought out of standby mode unless they are running in a cluster managed by vCenter Server.

Power Management Protocols

vSphere DPM can use one of three power management protocols to bring a host out of standby mode:

  • Intelligent Platform Management Interface (IPMI)
  • Hewlett-Packard Integrated Lights-Out (iLO), or
  • Wake-On-LAN (WOL)

Each protocol requires its own hardware support and configuration. If a host does not support any of these protocols it cannot be put into standby mode by vSphere DPM. If a host supports multiple protocols, they are used in the following order: IPMI, iLO, WOL.

Configure IPMI or iLO Settings for vSphere DPM (Host First)

IPMI is a hardware-level specification and Hewlett-Packard iLO is an embedded server management technology. Each of them describes and provides an interface for remotely monitoring and controlling computers.
You must perform the following procedure on each host.

Prerequisites

  • Both IPMI and iLO require a hardware Baseboard Management Controller (BMC) to provide a gateway for accessing hardware control functions, and allow the interface to be accessed from a remote system using serial or LAN connections. The BMC is powered-on even when the host itself is powered-off. If properly enabled, the BMC can respond to remote power-on commands.
  • If you plan to use IPMI or iLO as a wake protocol, you must configure the BMC. BMC configuration steps vary according to model. See your vendor’s documentation for more information.
  • With IPMI, you must also ensure that the BMC LAN channel is configured to be always available and to allow operator-privileged commands.
  • On some IPMI systems, when you enable “IPMI over LAN” you must configure this in the BIOS and specify a particular IPMI account.
  • vSphere DPM using only IPMI supports MD5- and plaintext-based authentication, but MD2-based authentication is not supported. vCenter Server uses MD5 if a host’s BMC reports that it is supported and enabled for the Operator role. Otherwise, plaintext-based authentication is used if the BMC reports it is supported and enabled. If neither MD5 nor plaintext authentication is enabled, IPMI cannot be used with the host and vCenter Server attempts to use Wake-on-LAN.

Instructions to Configure BMC from UEFI

(Alternatively, you may configure IPMI by pressing Ctrl + E during boot)

During these steps, remember to record the IP address and MAC address of the BMC.

  • Power on the host
  • Enter the Unified Server Configurator (UEFI v2.1) by pressing F10 (System Services) at boot
  • After the application start, select Configuration Wizards
  • Select iDRAC Configuration
  • Enable IPMI Over LAN, click Next
  • Enter a Host Name String that aligns with the ESXi host name, click Next
  • Enter a unique IP Address and the details of your network, click Next
  • Optionally configure IP6, click Next
  • Click Next at the Virtual Media Configuration screen
  • At the LAN User Configuration screen, we configured an account and password, click Next
  • At the summary screen, click Apply
  • Click Finish, Back, Exit and Reboot

Configure Wake On LAN

  • Power on the host
  • Press Ctrl + S to Enter Broadcom Comprehensive Configuration Management
  • Select the adapter to be used for WOL
  • Select MBA Configuration
  • Enable Pre-boot Wake On LAN
  • Press Escape, Escape, Save and Exit
  • Repeat steps 3 – 6 for any other adapters
  • Press Escape

Configure IPMI/iLO for vSphere 5

Note: This may only be done from a connection to vCenter. This is a feature that relies upon DRS.

  • Press Ctrl + Shift + H to enter the Host and Cluster View from within vSphere Client
  • Select the Host to enable IPMI/iLO (You should configure IPMI/iLO on all hosts in your cluster)
    3.Click the Configuration Tab > Software > Power Management
    4.Click Properties
    5.Enter the Username, Password, IP Address, and MAC Address
  • First you may need to log into IPMI, ILO or WOL and obtain the IP and MAC Address
  • Log into vCenter
  • Select a host
  • Select Configuration
  • Select Power Management
  • Click Properties
  • Enter the following information.
  • User name and password for a BMC account. (The user name must have the ability to remotely power the host on.)
  • IP address of the NIC associated with the BMC, as distinct from the IP address of the host. The IP address should be static or a DHCP address with infinite lease.
  • MAC address of the NIC associated with the BMC.

DPM

  • Click OK.

Configure Wake on LAN Settings for vSphere DPM (Host First)

The use of Wake-on-LAN (WOL) for the vSphere DPM feature is fully supported, if you configure and successfully test it according to the VMware guidelines. You must perform these steps before enabling vSphere DPM for a cluster for the first time or on any host that is being added to a cluster that is using vSphere DPM.

Prerequisites

Before testing WOL, ensure that your cluster meets the prerequisites.

  • Your cluster must contain at least two ESX 3.5 (or ESX 3i version 3.5) or later hosts.
  • Each host’s vMotion networking link must be working correctly. The vMotion network should also be a single IP subnet, not multiple subnets separated by routers
  • The vMotion NIC on each host must support WOL. To check for WOL support, first determine the name of the physical network adapter corresponding to the VMkernel port by selecting the host in the inventory panel of the vSphere Client, selecting the Configuration tab, and clicking Network Adapters

wol

  • The Wake On LAN Supported column for the relevant adapter should show Yes.
  • To display the WOL-compatibility status for each NIC on a host, select the host in the inventory panel o the vSphere Client, select the Configuration tab, and click Network Adapters. The NIC must show Yes in the Wake On LAN Supported column.
  • The switch port that each WOL-supporting vMotion NIC is plugged into should be set to auto negotiate the link speed, and not set to a fixed speed (for example, 1000 Mb/s). Many NICs support WOL only if they can switch to 100 Mb/s or less when the host is powered off.
  • After you verify these prerequisites, test each ESXi host that is going to use WOL to support vSphere DPM.
  • When you test these hosts, ensure that the vSphere DPM feature is disabled for the cluster
  • CAUTION Ensure that any host being added to a vSphere DPM cluster that uses WOL as a wake protocol is tested and disabled from using power management if it fails the testing. If this is not done, vSphere DPM might power off hosts that it subsequently cannot power back up.

Procedure

  • Click the Enter Standby Mode command on the host’s Summary tab in the vSphere Client.
  • This action powers down the host.
  • Try to bring the host out of standby mode by clicking the Power On command on the host’s Summary tab.
  • Observe whether or not the host successfully powers back on.
  • For any host that fails to exit standby mode successfully, select the host in the cluster Settings dialog box’s Host Options page and change its Power Management setting to Disabled.
  • After you do this, vSphere DPM does not consider that host a candidate for being powered off.

Enabling vSphere DPM for a DRS Cluster

After you have performed configuration or testing steps required by the wake protocol you are using on each host, you can enable vSphere DPM.
Configure the power management automation level, threshold, and host-level overrides. These settings are configured under Power Management in the cluster’s Settings dialog box.

If a host in your DRS cluster has USB devices connected, disable DPM for that host. Otherwise, DPM might turn off the host and sever the connection between the device and the virtual machine that was using it.

Instructions

  • Right click the cluster and select Edit Settings
  • Select Power Management

DPM

  • These priority ratings are based on the amount of over- or under-utilization found in the DRS cluster and the improvement that is expected from the intended host power state change. A priority-one recommendation is mandatory, while a priority-five recommendation brings only slight improvement.
  • The DRS threshold and the vSphere DPM threshold are essentially independent. You can differentiate the aggressiveness of the migration and host-power-state recommendations they respectively provide

Host-Level Overrides

When you enable vSphere DPM in a DRS cluster, by default all hosts in the cluster inherit its vSphere DPM automation level.
You can override this default for an individual host by selecting the Host Options page of the cluster’s Settings dialog box and clicking its Power Management setting. You can change this setting to the following options:

  • Disabled
  • Manual
  • Automatic

NOTE: Do not change a host’s Power Management setting if it has been set to Disabled due to failed exit standby mode testing.

DPM2

  • After enabling and running vSphere DPM, you can verify that it is functioning properly by viewing each host’s Last Time Exited Standby information displayed on the Host Options page in the cluster Settings dialog box and on the Hosts tab for each cluster. This field shows a timestamp and whether vCenter Server Succeeded or Failed the last time it attempted to bring the host out of standby mode. If no such attempt has been made, the field displays Never.

NOTE: Times for the Last Time Exited Standby text box are derived from the vCenter Server event log. If this log is cleared, the times are reset to Never.

Power Management Techniques

After VMware DPM has determined the number of hosts needed to handle the load and to satisfy all relevant constraints and VMware DRS has distributed virtual machines across the hosts in keeping with resource allocation constraints and objectives, each individual powered-on host is free to handle power management of its hardware. For CPU power management, ESX 3.5 and 4 place idle CPUs in C1 halt state. ESX 4 also has support for host-level power-saving mechanisms through changing ACPI P-states; also known as dynamic voltage and frequency scaling (DVFS). DVFS runs CPUs at a lower speed and possibly at a lower voltage when there is sufficient excess capacity where the workload will not be affected. DVFS is “off” by default but can be turned on by setting the Power.CpuPolicy advanced option to “dynamic” for the hardware that supports it. Host-level power management is synergistic with VMware DPM. Even though it can provide additional power savings beyond VMware DPM, it cannot save as much power as VMware DPM does by powering hosts down completely.

vSphere DPM Powers off the host when the cluster load is low

  • DPM considers a 40 minute load history
  • Migrates all VMs to other hosts

vSphere DPM powers on the host when the cluster load is high

  • DPM considers a 5 minute load history
  • Wake up packets are sent to the host which boots up
  • DRS initiates and some VMs are migrated to this host

VMware DPM Operation

The goal of VMware DPM is to keep the utilization of ESX hosts in the cluster within a target range, subject to the constraints specified by the VMware DPM operating parameters and those associated with VMware HA and VMware DRS. VMware DPM evaluates recommending host power-on operations when there are hosts whose utilization is above this range and host power-off operations when there are hosts whose utilization is below it. Although this approach might seem relatively straightforward, there are key challenges that VMware DPM must overcome to be an effective power-saving solution. These include the following:

  • Accurately assess workload resource demands. Overestimating can lead to less than ideal power savings. Underestimating can result in poor performance and violations of VMware DRS resource-level SLAs.
  • Avoid powering servers on and off frequently, even if running workloads are highly variable. Powering servers on and off too often impairs performance because it requires superfluous VMotion operations.
  • React rapidly to sudden increase in workload demands so that performance is not sacrificed when saving power.
  • Select the appropriate hosts to power on or off. Powering off a larger host with numerous virtual machines might violate the target utilization range on one or more smaller hosts.
    • Redistribute virtual machines intelligently after hosts are powered on and off by seamlessly leveraging VMware DRS.

VMware DPM is run as part of the periodic VMware DRS invocation (every five minutes by default), immediately after the core VMware DRS cluster analysis and rebalancing step is complete. VMware DRS itself may recommend host power-on operations, if the additional capacity is needed as a prerequisite for migration recommendations to honor VMware HA or VMware DRS constraints, to handle user requests involving host evacuation (such as maintenance mode), or to place newly powered-on virtual machines.

Evaluating Utilization

VMware DPM evaluates the CPU and memory resource utilization of each ESX host and aims to keep the host’s resource utilization within a target utilization range. VMware DPM may take appropriate action when the host’s utilization falls outside the target range. The target utilization range is defined as:

Target resource utilization range = DemandCapacityRatioTarget ± DemandCapacityRatioToleranceHost
By default, the utilization range is 45% to 81% (that is, 63% ±18%)

DPM3

Each ESX host’s resource utilization is calculated as demand/capacity for each resource (CPU and memory). In this calculation, demand is the total amount of the resource needed by the virtual machines currently running on the ESX host and capacity is the total amount of the resource currently available on the ESX host. A virtual machine’s demand includes both its actual usage and an estimate of its unsatisfied demand, to account for cases in which the demand value is constrained by the ESX host’s available resources. If an ESX host faces heavy contention for its resources, its demand can exceed 100 percent. VMware DPM computes actual memory usage using a statistical sampling estimate of the virtual machine’s working set size. It also computes the estimate of unsatisfied demand for memory using a heuristic technique.

VMware DPM calculates an ESX host’s resource demand as the aggregate demand over all the virtual machines running on that host. It calculates a virtual machine’s demand as its average demand over a historical period of time plus two standard deviations (capped at the virtual machine’s maximum demand observed over that period). Using a virtual machine’s average demand over a period of time, rather than simply its current demand, is intended to ensure that the demand used in the calculation is not anomalous. This approach also smoothes out any intermediate demand spikes that might lead to powering hosts on and off too frequently. The default period of time VMware DPM evaluates when it calculates average demand that may lead to host power-on recommendations is the past 300 seconds (five minutes). When it calculates average demand for host power-off recommendations, the default period of time VMware DPM evaluates is the past 2400 seconds (40 minutes). The default time period for evaluating host power-on recommendations is shorter because rapid reactions to power on hosts are considered more important than rapid reactions to power off hosts. In other words, providing the necessary resources for workload demands has a higher priority than saving power.

If any host’s CPU or memory resource utilization during the period evaluated for host power-on recommendations is above the target utilization range, VMware DPM evaluates powering hosts on. If any host’s CPU and any host’s memory resource utilization over the period evaluated for host power-off recommendations is below the target utilization range and there are no recommendations to power hosts on, VMware DPM evaluates powering hosts off.
In addition, when VMware DPM runs VMware DRS in what-if mode to evaluate the impact of host power-on and power-off, CPU and memory reservations are taken into account (as well as all other cluster constraints). VMware DRS will reject proposed host power-off recommendations that will violate reservations and VMware DRS will initiate host power-on operations to satisfy reservations.

Useful Link

http://www.vmware.com/files/pdf/Distributed-Power-Management-vSphere.pdf