Test FT failover, secondary restart and app fault tolerance in a FT VM

Fault Tolerance failure scenarios

Fault Tolerance failures are only triggered when there is no communication between the primary and secondary VMs.

vmware_fault_tolerance

Three scenarios may occur

Deterministic

This is where you can predict how a failover will occur

  • An ESXi host fails which causes complete host failover
  • The Primary VM process fails or becomes unresponsive on the ESXi host
  • A Fault Tolerance test is initiated from vCenter Server

Reactionary

This is where a failover may occur but you don’t know the expected outcome ahead of time. These events are not predicable as there is a race between the Primary and Secondary VMs to see which one should be the live one. The race prevents a split brain scenario that can cause data corruption

  • The Fault Tolerant NIC is interrupted or fails
  • The Fault Tolerant NIC communication is very slow

No action taken

This is where no failure can occur because Fault Tolerance does not monitor for this type of event

  • Management network interruption or failure
  • VM network interruption or failure
  • HBA Failures that do not affect the entire host
  • Any combination of the above

Testing Fault Tolerance

VMware provides a Test Failover function from the VM which is the best option for testing

3 Tests

  • Select the Test Failover Function from the Fault Tolerance menu on the Primary VM

This tests the Fault Tolerance functionality in a fully supported and non invasive way. In this scenario, the Virtual Machine fails over from Host A to Host B and a secondary VM is started back up again. VMware HA failure does not occur in this case

  • Host Failure

This can be accomplished by pulling the power cord of the host, rebooting the host or powering off the host from a remote KVM such as ILO, DRAC, IMM and RSA etc. The secondary VM on Host B takes over immediately and continues to process information for the VM. VMware HA occurs

  • Virtual Machine process on Host A fails

The scenario can be accomplished by terminating the active process for the VM by logging into Host A. The secondary VM takes over and no VMware HA failure occurs. VMware do not recommend testing in this way

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.