Failover Clusters in Windows Server 2008 – Quorums

What is a cluster?

A failover cluster is a group of independent computers that work together to increase the availability of applications and services. The clustered servers (called nodes) are connected by physical cables and by software. If one of the cluster nodes fails, another node begins to provide service (a process known as failover). Users experience a minimum of disruptions in service.

Are there any special considerations?

Microsoft supports a failover cluster solution only if all the hardware components are marked as “Certified for Windows Server 2008 R2.” In addition, the complete configuration (servers, network, and storage) must pass all tests in the Validate a Configuration wizard, which is included in the Failover Cluster Manager snap-in.

Note that this policy differs from the support policy for server clusters in Windows Server 2003, which required the entire cluster solution to be listed in the Windows Server Catalog under Cluster Solutions.

Cluster validation is intended to catch hardware or configuration problems before the cluster goes into production. Cluster validation helps to ensure that the solution you are about to deploy is truly dependable. Cluster validation can also be performed on configured failover clusters as a diagnostic tool.

Step by Step Guide

  • Run the cluster validation wizard for a failover cluster
  • If the cluster does not yet exist, choose the servers that you want to include in the cluster, and make sure you have installed the failover cluster feature on those servers. To install the feature, on a server running Windows Server 2008 or Windows Server 2008 R2, click Start, click Administrative Tools, click Server Manager, and under Features Summary, click Add Features. Use the Add Features wizard to add the Failover Clustering feature.
  • If the cluster already exists, make sure that you know the name of the cluster or a node in the cluster
  • For a planned cluster with all hardware connected: Run all tests.
  • For a planned cluster with parts of the hardware connected: Run System Configuration tests, Inventory tests, and tests that apply to the hardware that is connected (that is, Network tests if the network is connected or Storage tests if the storage is connected).
  • For a cluster to which you plan to add a server: Run all tests. Before you run them, be sure to connect the networks and storage for all servers that you plan to have in the cluster.
  • For troubleshooting an existing cluster: If you are troubleshooting an existing cluster, you might run all tests, although you could run only the tests that relate to the apparent issue.
  • In the failover cluster snap-in, in the console tree, make sure Failover Cluster Management is selected and then, under Management, click Validate a Configuration.

  • Follow the instructions in the wizard to specify the servers and the tests, and run the tests.
  • Note that when you run the cluster validation wizard on unclustered servers, you must enter the names of all the servers you want to test, not just one.
  • The Summary page appears after the tests run.
  • While still on the Summary page, click View Reportto view the test results.To view the results of the tests after you close the wizard, see SystemRoot\Cluster\Reports\Validation Report date and time.html where SystemRoot is the folder in which the operating system is installed (for example, C:\Windows)

Error Chart

Configuring the Quorum in a Failover Cluster

In simple terms, the quorum for a cluster is the number of elements that must be online for that cluster to continue running. In effect, each element can cast one “vote” to determine whether the cluster continues running. The voting elements are nodes or, in some cases, a disk witness or file share witness. Each voting element (with the exception of a file share witness) contains a copy of the cluster configuration, and the Cluster service works to keep all copies synchronized at all times

Note that the full function of a cluster depends not just on quorum, but on the capacity of each node to support the services and applications that fail over to that node. For example, a cluster that has five nodes could still have quorum after two nodes fail, but each remaining cluster node would continue serving clients only if it had enough capacity to support the services and applications that failed over to it.

Why Quorum is necessary

When network problems occur, they can interfere with communication between cluster nodes. A small set of nodes might be able to communicate together across a functioning part of a network, but might not be able to communicate with a different set of nodes in another part of the network. This can cause serious issues. In this “split” situation, at least one of the sets of nodes must stop running as a cluster.

To prevent the issues that are caused by a split in the cluster, the cluster software requires that any set of nodes running as a cluster must use a voting algorithm to determine whether, at a given time, that set has quorum. Because a given cluster has a specific set of nodes and a specific quorum configuration, the cluster will know how many “votes” constitutes a majority (that is, a quorum). If the number drops below the majority, the cluster stops running. Nodes will still listen for the presence of other nodes, in case another node appears again on the network, but the nodes will not begin to function as a cluster until the quorum exists again.

For example, in a five node cluster that is using a node majority, consider what happens if nodes 1, 2, and 3 can communicate with each other but not with nodes 4 and 5. Nodes 1, 2, and 3 constitute a majority, and they continue running as a cluster. Nodes 4 and 5 are a minority and stop running as a cluster, which prevents the problems of a “split” situation. If node 3 loses communication with other nodes, all nodes stop running as a cluster. However, all functioning nodes will continue to listen for communication, so that when the network begins working again, the cluster can form and begin to run.

Overview of the Quorum Nodes

There have been significant improvements to the quorum model in Windows Server 2008. In Windows Server 2003, almost all server clusters used a disk in cluster storage (the “quorum resource”) as the quorum. If a node could communicate with the specified disk, the node could function as a part of a cluster, and otherwise it could not. This made the quorum resource a potential single point of failure. In Windows Server 2008, a majority of ‘votes’ is what determines whether a cluster achieves quorum. Nodes can vote, and where appropriate, either a disk in cluster storage (called a “disk witness”) or a file share (called a “file share witness”) can vote. There is also a quorum mode called No Majority: Disk Only which functions like the disk-based quorum in Windows Server 2003. Aside from that mode, there is no single point of failure with the quorum modes, since what matters is the number of votes, not whether a particular element is available to vote.

This new quorum model is flexible and you can choose the mode best suited to your cluster.

Important: In most situations, it is best to use the quorum mode selected by the cluster software. If you run the quorum configuration wizard, the quorum mode that the wizard lists as “recommended” is the quorum mode chosen by the cluster software. We only recommend changing the quorum configuration if you have determined that the change is appropriate for your cluster.

There are four quorum modes:

  • Node Majority: Each node that is available and in communication can vote. The cluster functions only with a majority of the votes, that is, more than half.
  • Node and Disk Majority: Each node plus a designated disk in the cluster storage (the “disk witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes, that is, more than half.
  • Node and File Share Majority: Each node plus a designated file share created by the administrator (the “file share witness”) can vote, whenever they are available and in communication. The cluster functions only with a majority of the votes, that is, more than half.
  • No Majority: Disk Only: The cluster has quorum if one node is available and in communication with a specific disk in the cluster storage.

Choosing the Quorum Mode for a particular cluster

Description of Cluster

Quorum Recommendation

Odd number of nodes

Node Majority

Even number of nodes (but not a multi-site cluster)

Node and Disk Majority

Even number of nodes, multi-site cluster

Node and File Share Majority

Even number of nodes, no shared storage

Node and File Share Majority

Node Majority

The following diagram shows Node Majority used (as recommended) for a cluster with an odd number of nodes.In this mode, each node gets one vote. In certain circumstances, you might want to install a hotfix that lets you select which nodes will have votes. This can be useful with certain multi-site clusters, for example, where you want one site to have more votes than other sites in a disaster recovery situation

Node and Disk Majority

The following diagram shows Node and Disk Majority used (as recommended) for a cluster with an even number of nodes. Each node can vote, as can the disk witness.

  • Use a small Logical Unit Number (LUN) that is at least 512 MB in size.
  • Choose a basic disk with a single volume.
  • Make sure that the LUN is dedicated to the disk witness. It must not contain any other user or application data.
  • Choose whether to assign a drive letter to the LUN based on the needs of your cluster. The LUN does not have to have a drive letter (to conserve drive letters for applications).
  • As with other LUNs that are to be used by the cluster, you must add the LUN to the set of disks that the cluster can use. For more information, see http://go.microsoft.com/fwlink/?LinkId=114539.
  • Make sure that the LUN has been verified with the Validate a Configuration Wizard.
  • We recommend that you configure the LUN with hardware RAID for fault tolerance.
  • In most situations, do not back up the disk witness or the data on it. Backing up the disk witness can add to the input/output (I/O) activity on the disk and decrease its performance, which could potentially cause it to fail.
  • We recommend that you avoid all antivirus scanning on the disk witness.
  • Format the LUN with the NTFS file system.

If there is a disk witness configured, but bringing that disk online will not achieve quorum, then it remains offline. If bringing that disk online will achieve quorum, then it is brought online by the cluster software

Node and File Share Majority

The following diagram shows Node and File Share Majority used (as recommended) for a cluster with an even number of nodes and a situation where having a file share witness works better than having a disk witness. Each node can vote, as can the file share witness.

  • Use a Server Message Block (SMB) share on a Windows Server 2003 or Windows Server 2008 file server.
  • Make sure that the file share has a minimum of 5 MB of free space.
  • Make sure that the file share is dedicated to the cluster and is not used in other ways (including storage of user or application data).
  • Do not place the share on a node that is a member of this cluster or will become a member of this cluster in the future.
  • You can place the share on a file server that has multiple file shares servicing different purposes. This may include multiple file share witnesses, each one a dedicated share. You can even place the share on a clustered file server (in a different cluster), which would typically be a clustered file server containing multiple file shares servicing different purposes.
  • For a multi-site cluster, you can co-locate the external file share at one of the sites where a node or nodes are located. However, we recommend that you configure the external share in a separate third site.
  • Place the file share on a server that is a member of a domain, in the same forest as the cluster nodes.
  • For the folder that the file share uses, make sure that the administrator has Full Control share and NTFS permissions.
  • Do not use a file share that is part of a Distributed File System (DFS) Namespace

No Majority – Disk only

The following illustration shows how a cluster that uses the disk as the only determiner of quorum can run even if only one node is available and in communication with the quorum disk. It also shows how the cluster cannot run if the quorum disk is not available (single point of failure). For this cluster, which has an odd number of nodes, Node Majority is the recommended quorum mode.

  • Use a small Logical Unit Number (LUN) that is at least 512 MB in size.
  • Choose a basic disk with a single volume.
  • Make sure that the LUN is dedicated to the disk witness. It must not contain any other user or application data.
  • Choose whether to assign a drive letter to the LUN based on the needs of your cluster. The LUN does not have to have a drive letter (to conserve drive letters for applications).
  • As with other LUNs that are to be used by the cluster, you must add the LUN to the set of disks that the cluster can use. For more information, see http://go.microsoft.com/fwlink/?LinkId=114539.
  • Make sure that the LUN has been verified with the Validate a Configuration Wizard.
  • We recommend that you configure the LUN with hardware RAID for fault tolerance.
  • In most situations, do not back up the disk witness or the data on it. Backing up the disk witness can add to the input/output (I/O) activity on the disk and decrease its performance, which could potentially cause it to fail.
  • We recommend that you avoid all antivirus scanning on the disk witness.
  • Format the LUN with the NTFS file system.

If there is a disk witness configured, but bringing that disk online will not achieve quorum, then it remains offline. If bringing that disk online will achieve quorum, then it is brought online by the cluster software

Viewing the Quorum Configuration

  • To open the failover cluster snap-in, click Start, click Administrative Tools, and then click Failover Cluster Management (in Windows Server 2008) or Failover Cluster Manager (in Windows Server 2008 R2).If the User Account Control dialog box appears, confirm that the action it displays is what you want, and then click Continue.
  • In the console tree, if the cluster that you want to view is not displayed, right-click Failover Cluster Management or Failover Cluster Manager, click Manage a Cluster, and then select the cluster you want to view
  • In the center pane, find Quorum Configuration, and view the description
  • In the following example, the quorum mode is Node and Disk Majority and the disk witness is Cluster Disk 2.

 

Managing Processor use for Virtual Environments

General Rules for Processor Scheduling

  1. ESX(i) schedules VMs onto and off of processors as needed
  2. Whenever a VM is scheduled to a processor, all of the cores must be available for the VM to be scheduled or the VM cannot be scheduled at all
  3. If a VM cannot be scheduled to a prcoessor when it needs access, VM performance can suffer a great deal.
  4. When VMs are ready for a processor but are unable to be scheduled, this creates what VMware calls the CPU %Ready values
  5. CPU %Ready manifests itself as a utilisation issue but is actually a scheduling issue
  6. VMware attempts to schedule VMs on the same core over and over again and sometimes it has to move to another processor. Processor caches contain certain information that allows the OS to perform better. If the VM is actually moved across sockets and the cache isn’t shared, then it needs to be loaded with this new info.
  7. Maintain consistent Guest OS configurations

Scheduling Issues

  1. Mixing Single, dual and quad core vCPUs VMs on the same ESX(i) server can create major scheduling problems. This is especially true when the ESX Server has low core densities or when the ESX servers average moderate to high utilisation levels
  2. Where possible reduce VMs to single vCPU VMs except if they host an application which requires multiple CPUs or if you find reducing on to one core is not possible to due to high utilisation on both cores on that particular VM
  3. Keep an eye on scheduling issues especially CPU% Ready. More than 2% indicates processor scheduling issues

Performance enhancers for vSphere

  1. Non scheduling of idle processors

vSphere has the ability to skip scheduling of idle processors. For example if a quad processor VM has activity on only 1 core, vSphere has the ability to schedule only that single core sometimes. A multi threaded app will likely be using most or all of its cores most of the time. If a VM has CPUs that are sitting idle a lot, it should be reviewed whether this VM actually needs the multiple processors

If your application is not multi-threaded, you gain nothing by adding cores to the VM and make it more difficult to schedule

2.  Processor Skew

Guest OSs expect to see progress on all of their cores all of the time. vSphere has the ability to allow a small amount of skew whereby the processors need not be completely in sync but this has to be kept within reasonable limits

For a detailed description of how ESX(i) schedules VMs to processors please read

http://www.vmware.com/files/pdf/perf-vsphere-cpu_scheduler.pdf

ESXTOP Troubleshooting Overview Chart

Really useful ESXTOP Overview Chart of Performance Statistics courtesy of vmworld.net

VMware Visio Shapes

Useful to have for any drawings or diagrams

http://communities.vmware.com/docs/DOC-13702

What is an ODBC Connection?

ODBC stands for Open Data Base Connectivity. It is a connection that is created to define a connection between a computer and a database stored on another system. The ODBC connection contains information needed to allow a computer user to access the information stored in a database that is not local to that computer. You need to define the type of the database application – like Microsoft SQL or Oracle or FoxPro or mySQL. Once you have defined the type of database you need to select or supply the appropriate driver for a connection (Windows already contains many of these) and then supply the name of the database file and the credentials needed to access the database.
Once the ODBC connection is created, you can tell specific programs to use that ODBC connection to access information in that database.

It was developed by the SQL Access Group in 1992 to standardize the use of a DBMS by an application. ODBC provides a universal middlewarelayer between the application and DBMS, allowing the application developer to use a single interface. If changes are made to the DBMS specification, only the driver needs updating. An ODBC driver can be thought of as analogous to a printer or other driver, providing a standard set of functions for the application to use, and implementing DBMS-specific functionality.

An application that can use ODBC is referred to as “ODBC-compliant”. Any ODBC-compliant application can access any DBMS for which a driver is installed. Drivers exist for all major DBMSs and even for text or CSV files.

VMware ODBC Connection Example

The database used for vCenter can be installed on a dedicated host, or the same host depending on the size of your environment.  For scalability, I recommend installing vCenter and SQL Server on separate hosts unless your running a very small environment and know that it will never grow

Instructions

  • Note SQL Server must be installed and a DB created for vCenter
  • Note: The ODBC source vCenter uses must be created with the SQL Native Client driver (the SQL driver provided with Windows will not work.)  The SQL Native Client driver can be found on the SQL Server 2005 installation media. Install this first
  • Open Control Panel on your vCenter Server followed by Administrative Tools
  • Open the ODBC Connection icon

  • Click the System DSN tab and click the “Add” button

  • Choose SQL Native Client

  • Provide a Name for the driver (you will reference this name during the vCenter install) and provide the Hostname/IP of the SQL Server instance you created when you installed SQL Server

  • Click Next and Select the “SQL Server authentication” type and provide the Login and password created from installing SQL Server – You will have logged into SQL Management Studio and created a new user (SQL Server Authentication)

  • Click Next and Change the default database to the name of your previously created vCenter database

  • Click Next and keep the following settings

  • Click Finish

  • Test Data Source. It should show the below

Deleting a VM with Raw Disk Mappings

How do you delete a VM with Raw Disk Mappings?

To the best of my knowledge, if you delete the VM it will only delete the pointer file and you would have to delete the RDM on the SAN.

Before deleting the VM, you could always click on Edit Settings and select the RDM and Remove Disk – Delete from disk. This should delete both the pointer file and the RDM.

Should you put Antivirus on your VMware Hosts?

Installing Antivirus software on the service console its like choosing between security and performance for ESX(i) host. It will impact your performance dramatically since it will use RAM/CPU and potentially causes performance degradation. ESX(i) is very secure platform and if you can lock-down your SC than you’re pretty safe. I wouldn’t recommend deploying any antivirus to ESX(i) service console at all.  To maximize security on ESX(i)/SC, you can apply Tripwire Checkconfig tool, CIS security guide or even DoD UNIX SRR scripts that scan and remediate in depth with security world.

VMKernel Security

The below security features are inbuilt into VMware ESXi which indicate that VMware have already thought the security aspect through very well and should be an indication as to why adding Antivirus to these systems may not be absolutely necessary

  • Memory Hardening

The ESX/ESXi kernel, user-mode applications, and executable components such as drivers and libraries are located at random, non-predictable memory addresses. Combined with the nonexecutable memory protections made available by microprocessors, this provides protection that makes it difficult for malicious code to use memory exploits to take advantage of vulnerabilities.

  • Kernel Module Integrity

Digital signing ensures the integrity and authenticity of modules, drivers and applications as they are loaded by the VMkernel. Module signing allows ESXi to identify the providers of modules, drivers, or applications and whether they are VMware-certified. Trusted Platform Module (ESXi ONLY and BIOS enabled) – This module is a hardware element that represents the core of trust for a hardware platform and enables attestation of the boot process, as well as cryptographic key storage and protection. Each time ESXi boots, TPM measures the VMkernel with which ESXi booted in one of its Platform Configuration Registers (PCRs). TPM measurements are propagated to vCenter Server when the host is added to the vCenter Server system

Recommendations

  • Lock down the service console to only those that require direct access to it.
  • Don’t allow root logins, and use sudo instead, and integrate the logins into an LDAP or AD infrastructure.
  • There are very few viruses that will affect a Linux system, and they require specialized privileges to infect, and then run on a *nix system.
  • You want to minimize the agents running on an ESX(i) host, even though it has a service console that is Linux, you want to minimize any additional software that may interfere with the running of the VMkernel.

Robert Moog’s 78th birthday celebrated in Google doodle

Google has marked the birthday of music pioneer Robert Moog by creating a ‘Doodle’ in the form of an interactive electronic synthesiser which can be played by clicking on its keys using a cursor.

Although musical synthesisers already existed, Moog transformed pop music during the 1960s by producing and marketing a small keyboard synth which could be used with relative ease.

Bands including the Beatles and the Doors used the Moog synthesiser, while others later became fans of the Minimoog, a stripped-down version which followed it in the 1970s.

The New Yorker, who would have turned 78 on Wednesday, had been encouraged to dabble in electronics from an early age by his father and built his first electronic instrument, a theremin, at the age of 14

The pair started their own company in 1954 to sell theremin kits by mail order. After studying at the Bronx High School of Science, Moog attended Queens College before graduating in electrical engineering at Columbia University and earning a doctorate in engineering physics at Cornell.

Of the Moog synthesiser, which appeared in 1964, the inventor was to later recall: “I didn’t know what the hell I was doing. I was doing this thing to have a good time, then all of a sudden someone’s saying to me, ‘I’ll take one of those and two of that.’ That’s how I got into business.”

Moog died in 2005 at the age of 71, after being diagnosed with a brain tumour four months previously. However, the Moog sound has lived on, with musicians such as Fat Boy Slim choosing to continue to use it even in the digital era.

What is the /admin switch in Microsoft Terminal Services Client (MSTSC) for Windows 2008 and Vista?

Although the /console switch no longer has any effect on Server 2008 and Vista Terminal Server connections, a new switch called the /admin switch has a similar effect when you use it to connect to a Server 2008 server with the Terminal Services role. When you use this switch with MSTSC, connections don’t consume Terminal Services CALs.

The /admin switch involves elevated rights. If a user has the authority to use the /admin switch but has been marked with Deny Users Permissions To Log On To Terminal Server, he or she will be able to connect using mstsc /admin. Also, if a terminal server is in drain mode (no new sessions are accepted), an /admin session can still be created. The /admin sessions don’t count toward the session limit that may be configured on a terminal server to limit the number of session

VMware DB Scripts for Performance Stats

We reinstalled our SQL Server 2008 Database completely and restored the Virtual Center DB and the Virtual Update Manager DB mainly due to someone installing the SQL Server 2008 software in Evaluation Mode which then expired and caused us no end of problems!

What you need to remember if you rebuild and restore the databases is to add the performance scripts back into SQL Server Management Studio for Weekly, Monthly and Yearly

The Issue

Following a SQL DB re-installation and restore we were doing the following

  • Click on Host
  • Click on Performance Tab
  • Click Advanced
  • Click on Chart Options
  • Choose Week, or Past Month

It comes up with “Performance Data is currently not available for this entity”

Instructions

  • Depending on how you are setup. (We had a separate VM for our DB and vCenter Server)
  • Log into SQL Management Studio on your Database Server
  • On the DB Server, create a shortcut to your vCenter Server to the following path. (This contains the scripts you need and may be on the C or D Drive)
  • D:\Program Files\VMware\Infrastructure\VirtualCenter Server
  • Make sure you have the Virtual Center DB selected in SQL Management
    Studio. Then just double click the file, it should open against the VCDB.
  • Check this link for the script names

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1004382

  • If you have problems with that for any reason you can create a new query
    against the VCDB and then open up the SQL scripts in notepad and copy
    across.
  • The scheduled jobs trigger the stored procedures which are already part of the DB. Amongst other things they clear out the raw data after it’s been processed and is no longer needed, they will also remove old tasks and events from the DB.

Scripts

This is what you should see in your SQL Server Management Studio Application