Fault Tolerance

February 5, 2013 Objective 4 Business Continuity No comments

What is Fault Tolerance?

FT is the evolution of continuous availability that utilises VMware vLockstep technology to keep a primary and secondary virtual machine in sync. It is based on the record/playback technology used in VMware Workstation. It streams non-deterministic events and then replay will occur deterministically. This means it matches instruction for instruction and memory for memory to create identical processing

Deterministic means that the processor will execute the same instruction set on the secondary VM

Non-Deterministic means event functions such as network/disk/mouse and keyboard including hardware interrupts which are also played back

The Primary and Secondary VMs continuously exchange heartbeats. This exchange allows the virtual machine pair to monitor the status of one another to ensure that Fault Tolerance is continually maintained. A transparent failover occurs if the host running the Primary VM fails, in which case the Secondary VM is immediately activated to replace the Primary VM. A new Secondary VM is started and Fault Tolerance redundancy is reestablished within a few seconds. If the host running the Secondary VM fails, it is also immediately replaced. In either case, users experience no interruption in service and no loss of data

Fault Tolerance avoids “split-brain” situations, which can lead to two active copies of a virtual machine after recovery from a failure. Atomic file locking on shared storage is used to coordinate failover so that only one side continues running as the Primary VM and a new Secondary VM is respawned automatically.

Use Cases

Applications that need to be available at all times, especially those that have long-lasting client connections that users want to maintain during hardware failure.
Custom applications that have no other way of doing clustering.
Cases where high availability might be provided through custom clustering solutions, which are too complicated to configure and maintain.
On demand protection for VMs running end of month reports or financials

Best Practices for Fault Tolerance

To ensure optimal Fault Tolerance results, VMware recommends that you follow certain best practices. In addition to the following information, see the white paper VMware Fault Tolerance Recommendations and Considerations at http://www.vmware.com/resources/techresources/10040

Requirements for FT

Cluster Requirements
Host Requirements
VM Requirements

Cluster Requirements

Host certificate checking must be enabled. Default for vSphere 4.1 but you may need to enable this (vCenter Server Settings > SSL Settings > Select the vCenter requires verified host SSL certificates)
The cluster must have at least 2 ESXi hosts running the same FT Version or build number
HA must be enabled on the cluster
EVC must be enabled if you want to use FT in conjunction with DRS or DRS will be disabled

Hosts Requirements

The ESXi hosts must have access to the same datastores and networks
The ESXi hosts must have a FT Logging network setup
The FT Logging network must have at least 1GB connectivity
NICs can be shared if necessary
The ESXi hosts CPUs must be FT compatible
Host must be licensed for FT
Hardware Virtualisation must be enabled on the BIOS of the hosts to enable CPU support for FT
It is recommended that Power Management is turned off in the BIOS. This helps ensure uniformity in the CPU speeds

VMs Requirements

Only VMs with a single CPU are supported
VMs must be running a supported O/S
VMs must be stored on shared storage available to all hosts
FC, iSCSI, FCOE and NFS are supported
A VMs disk must be eager zeroedthick format or a Virtual RDM (Physical RDMs are not supported)
No VM snapshots
The VM must not be a linked clone
No USB, Sound devices, serial ports or parallel ports configured
The VM cannot use NPIV
Nested Page Tables/Extended Page Tables are not supported
The VM cannot use NIC Passthrough
The VM cannot use the older vlance drivers
No CD-ROM or floppy devices attached
The VM cannot use a paravirtualised kernel
VMs must be on the correct Monitor Mode

Caveats

You can use vMotion but not Storage vMotion and therefore Storage sDRS
Hot Plugging is not allowed
You cannot change the network settings while the VM is on
Because snapshots are not supported, you will not be able to use any backup mechanism that uses snapshots. You can disable FT first before backing up

Configure FT Networking for Host Machines

On each host that you want to add to a vSphere HA cluster, you must configure two different networking switches so that the host can also support vSphere Fault Tolerance.
To enable Fault Tolerance for a host, you must complete this procedure twice, once for each port group option to ensure that sufficient bandwidth is available for Fault Tolerance logging. Select one option, finish this procedure, and repeat the procedure a second time, selecting the other port group option.

Prerequisites

Multiple gigabit Network Interface Cards (NICs) are required. For each host supporting Fault Tolerance, you need a minimum of two physical gigabit NICs. For example, you need one dedicated to Fault Tolerance logging and one dedicated to vMotion.
VMware recommends three or more NICs to ensure availability.
The vMotion and FT logging NICs must be on different subnets
IPv6 is not supported on the FT logging NIC.

Procedure

Connect vSphere Client to vCenter Server.
In the vCenter Server inventory, select the host and click the Configuration tab.
Select Networking under Hardware, and click the Add Networking link
The Add Network wizard appears.
Select VMkernel under Connection Types and click Next.
Select Create a virtual switch and click Next.
Provide a label for the switch.
Select either Use this port group for vMotion or Use this port group for Fault Tolerance logging and click Next.
Provide an IP address and subnet mask and click Next.

Click Finish.

Networking Example

vMotion and FT Logging can share the same VLAN (configure the same VLAN number in both port groups), but require their own unique IP addresses residing in different IP subnets. However, separate VLANs might be preferred if Quality of Service (QoS) restrictions are in effect on the physical network with VLAN based QoS. QoS is of particular use where competing traffic comes into play, for example, where multiple physical switch hops are used or when a failover occurs and multiple traffic types compete for network resources.

This example uses four port groups configured as follows:

VLAN A: Virtual Machine Network Port Group-active on vmnic2 (to physical switch #1); standby on vmnic0 (to physical switch #2.)
VLAN B: Management Network Port Group-active on vmnic0 (to physical switch #2); standby on vmnic2 (to physical switch #1.)
VLAN C: vMotion Port Group-active on vmnic1 (to physical switch #2); standby on vmnic3 (to physical switch #1.)
VLAN D: FT Logging Port Group-active on vmnic3 (to physical switch #1); standby on vmnic1 (to physical switch #2.)

Instructions for setup

Connect to vCenter using the vClient or Web Client
Right click the VM you want to use for FT and select Fault Tolerance > Turn on Fault Tolerance

You will get a message as per below

vSphere Fault Tolerance Configuration Recommendations

VMware recommends that you observe certain guidelines when configuring Fault Tolerance.

In addition to non-fault tolerant virtual machines, you should have no more than four fault tolerant virtual machines (primaries or secondaries) on any single host. The number of fault tolerant virtual machines that you can safely run on each host is based on the sizes and workloads of the ESXi host and virtual machines, all of which can vary.
If you are using NFS to access shared storage, use dedicated NAS hardware with at least a 1Gbit NIC to obtain the network performance required for Fault Tolerance to work properly.
Ensure that a resource pool containing fault tolerant virtual machines has excess memory above the memory size of the virtual machines. The memory reservation of a fault tolerant virtual machine is set to the virtual machine’s memory size when Fault Tolerance is turned on. Without this excess in the resource pool, there might not be any memory available to use as overhead memory.
Use a maximum of 16 virtual disks per fault tolerant virtual machine.
To ensure redundancy and maximum Fault Tolerance protection, you should have a minimum of three hosts in the cluster. In a failover situation, this provides a host that can accommodate the new Secondary VM that is created.

Tags:
FT

M	T	W	T	F	S	S
				1	2	3
4	5	6	7	8	9	10
11	12	13	14	15	16	17
18	19	20	21	22	23	24
25	26	27	28

Fault Tolerance

Leave a Reply Cancel reply

Electric Monk

Don't think about what can happen in a month. Don't think what can happen in a year. Just focus on the 24 hours in front of you and do what you can to get closer to where you want to be :-)

Search

Calendar

Social Media and RSS

vExpert

Recent Posts

Archives

Categories

Fatcow Webhosting

Fault Tolerance

Leave a Reply Cancel reply

Electric Monk

Don't think about what can happen in a month. Don't think what can happen in a year. Just focus on the 24 hours in front of you and do what you can to get closer to where you want to be :-)

Search

Calendar

Social Media and RSS

vExpert

Recent Posts

Archives

Categories

Tags

Fatcow Webhosting