Host Profile Compliance Unknown in vSphere 6.5U2

The Problem

This error comes up on a 6.5U2 environment running on HP Proliant BL460c Gen9 Servers. Where it previously had fully compliant hosts, now when checking compliance of a host against a host profile, all you see is Host Compliance Unknown.

The Resolution

I’m still not entirely sure how we went from a state of Compliant hosts to Unknown but this is how we resolved it

  • Deleted the AutoDeploy Rule
  • Recreated the AutoDeploy Rule
  • Ran the below command. Anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location; you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts.  This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet or running the single PowerCLI command below

foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}

  • Checked the status of the atftpd service in vCenter. Note: We are using the inbuilt TFTP server in vCenter however this is now not supported by VMware but we find it works just fine. Solarwinds is a good alternative.

service atftpd status

service atftpd start

  • Rebooted one host while checking the startup in the ILO console.
  • You may need to remediate your hosts after boot-up or if everything is absolutely configured correctly, it will simply boot up, add itself into the cluster and become compliant

SRM 8.1.1 and Hitachi F900

Initial Requirements

  • Hitchai CCI installer – RMHORC_X64
  • SRM Installer – VMware-srm-8.1.1-10646916
  • Hitachi SRA installer – Hitachi_Raid_Manager_SRA_Ver02.03.01
  • 1 x Windows Server 2016 Primary site SRM Server
  • 1 x Windows Server 2016 Secondary site SRM Server
  • 1 x 50MB command device on the Primary site server
    and mount to the SRM VM as a Physical mode RDM
  • 1 x 50MB command device on the Secondary site server
    and mount to the SRM VM as a Physical mode RDM
  • 2 x SRM service accounts for the protected SRM server and recovery SRM server unless you want to run SRM under the Local system account
  • 2 x SRM DB service accounts for the protected SRM server and recovery SRM server
  • SSL certificates – https://kb.vmware.com/s/article/2085644

What is the Hitachi SRA?

Hitachi Storage Replication Adapter (SRA) is an interface that integrates
Hitachi storage systems and replication software with VMware® vCenter
SRM™ processes

What is the Hitachi CCI?

Hitachi’s remote and in-system replication software require CCI to manage
the pairs. The adapter plug-in links CCI with Site Recovery Manager.
There are two CCI components:

  • Command devices, which reside on the storage systems. CCI uses the
    command device as the interface to the storage system from the host. The command device accepts commands from the host and executes them on the storage system. The command device is a dedicated logical volume.
  • Hitachi Open Remote Copy Manager (HORCM), which resides on the CCI server. HORCM operates as a daemon process. When activated, HORCM refers to CCI configuration definition files, also located on the server. The HORCM instance communicates with the storage system and remote servers.
    HORCM definition files describe the storage systems, pair volumes, and data paths. When a user issues a command, CCI uses the information in the HORCM files to identify which volumes are the targets of the command.
    Two HORCM files are needed for each pair. One file describes the primary volumes (P-VOLs), which are also referred to as “protected volumes”, and the other describes the secondary volumes (S-VOLs), which are also referred to as “recovery volumes”.

VMware SRM and Hitachi Components

Installation Steps

  • Ask the Storage Team to present a 50MB LUN to the hosts. This will be the command device. Edit the settings of each Primary and Secondary SRM VM and add the 50MB LUN as an RDM. Log into each SRM VM and bring the disk online and initialised but not formatted

The storage team need to make sure the Command Device has the following settings on the Hitachi side or the HORCM service will not run correctly.

  • Go to the SRM installer and Run as Administrator
  • Select a language
  • Click Next
  • Click Next
  • Accept the License agreement
  • Check Prerequisites
  • Change the install directory if you want or leave it on the C Drive. we install ours on the D Drive
  • Select a vCenter Server to register to
  • Fill in the Site name
  • Fill in an email address
  • Fill in the IP address for the SRM server
  • Choose the Default Site Recovery Manager Plug-in identifier
  • Select what certificate to use. I have generated a PKCS#12 cert so I will use a signed certificate
  • Note: When I generated the certificate through OpenSSL, I specified a password which is what you will need to enter when adding the certificate – https://kb.vmware.com/s/article/2085644
  • The certificate will have a .p12 extension
  • Choose the embedded option as this now supports a full installation of SRM
  • Enter the details in the Embedded Database Configuration
  • Enter the Site Recovery Manager Service Account
  • Click Finish to start the installation
  • You will see the installer creating the SRM Database
  • And
  • When it finishes, it should show the below screen
  • If you log into the vCenter you should see the Site Recovery icon in the menu
  • If you click Home and select Recovery Manager, you will see the below screen.
  • If you click open Site Recovery at the moment, it will ask you to sign in with SSO credentials then it will say the below message. Leave it here while we move on to installing the Recovery SRM server
  • Now you need to repeat all the above install steps on the Recovery SRM Server
  • One the Recovery SRM is complete, log into vCenter, go to site Recovery Manager and click on new Site Pair
  • Enter the details of the First site and Second site
  • Click next and check the details
  • Click Finish to Pair the sites
  • Now you will see the below screen if it is successful
  • If you now click on the View Details screen, then you will see the full details come up for the two sites
  • Next we need to install the Hitachi Command Control Interface
  • Note: I have already copied the software
  • Right click on Setup and run as Administrator
  • Read the below text and click Next
  • The default installation drive is C:\HORCM. I’m installing everything on my D Drive so you’ll see the Destination folder as D:\HORCM
  • The installer will run and finish
  • Reboot the server
  • When the server has rebooted, verify the correct version of the CCI software is running on the system by executing the below command
  • D:/HORCM\etc> raidqry -h
  • Install the CCI software on the recovery SRM server, reboot and check the version as per the above steps
  • Next, You will need two HORCM configuration definition files to define the pair relationship: one file describes the primary volumes (P-VOLs) on the Protected SRM Server, the other file describes the secondary volumes (S-VOLs) on the Recovery SRM Server.
  • You will need to take a copy of the default HORCM.conf file which gets installed with CCI in D:\HORCM\etc and copy it and rename it and place it in D:\HORCM\conf – Note: Just for clarity, I have named the HORCM.conf file on the Protected Server HORCM100.conf and then I’ll rename the HORCM.conf file as HORCM101.conf on the Recovery SRM Server. They must be consecutive numbers
  • And the same on the Recovery site
  • Open up the HORCM100.conf file and have a look at how it is structured in Notepad. Wordpad seems to lose clarity. It is quite a large file full of information (Hitachi Documentation example below) You will find the file is much larger than this and can be cut down very simply to the below

Example HORCM0.conf file from the Hitachi SRA for VMware vCenter SRM deployment guide

  • HORCM_MON – Information for monitoring the HORCM instance. Includes the IP address of the primary server, HORCM instance or service, polling interval for monitoring paired volumes and timeout period for communication with the remote server.
  • HORCM_CMD – Command device from the protected storage system. Replace the number with the serial number of the primary storage system
  • HORCM_LDEV – #dev_group is the group name for the pairs. dev_name is the pair name (Example uses P_VOL_S_VOL). The serial number is the storage system’s serial number) . CU:LDEV(LDEV#) is the LDEV ID of the P-VOL. MU# is the mirror unit number. use MU#0-2 for ShadowImage, Thin Image and Copy-on-Write Snapshot. You do not need to specify MU# for TC, UR and GAD. If you want to specify MU# for TC, UR, and GAD, use MU#h0 for TC and MU#h0-h3 for UR and GAD.
  • HORCM_INST – #dev_group is the group name for the pairs. ip address is the network address of the remote SRM server. service is the remote HORCM instance

Example HORCM1.conf for the secondary site remote replication pair 

  • HORCM_MON – Shows the IP address of the secondary server, HORCM instance or service, polling interval for monitoring paired volumes, and timeout period for communication with the remote server
  • HORCM_CMD – Shows the command device on the remote site. Note that the instance or service is increased from the primary instance by 1. Use the storage systems serial number
  • HORCM_LDEV – Shows the same group and device name for the pair as used in the primary site HORCM file. the second entry in this section is a group for the ShadowImage pair used for testing. the remote pair’s S-VOL is in the system pair’s P-VOL When using ShadowImage for the in-system pair, make sure that the MU number is set for the P-VOL.
  • HORCM_INST – Shows the pair’s group name and the IP address and service number of the primary host. the second entry in the system shows the secondary host address
  • The TC or UR group must be defined before the SI group.
  • The MU# (h0-h3) for UR and GAD devices must be specified.
  • The MU# for ShadowImage devices must be specified. If MU#1 or MU#2 are used, the environment variable RMSRATMU must be set

Here are the 2 files together so you can see how it all works

  • Do not edit the configuration definition file while CCI is running. Shut
    down CCI, edit the configuration file as needed, and then restart CCI.
    (horcmshutdowm) When you change the system configuration, it is required to shut down CCI once and rewrite the configuration definition file to match with the change and then restart CCI.
    (horcmstart) When you change the storage system configuration (microprogram, cache capacity, LU path, and so on), you must restart CCI regardless of the necessity of the configuration definition file editing. When you restart CCI, confirm that there is no contradiction in the connection configuration by using the “-c” option of the pairdisplay command and the raidqry command. However, you cannot confirm the consistency of the P-VOL and S-VOL capacity with the “-c” option of pairdisplay command. Confirm the capacity of each volume by using the raidcom command
  • The HORCM.conf file has set parameters as seen below

Environment variables

RMSRA20 requires that the following system environment variables be defined in order to make certain parameters available

Sometimes it may be worth speaking to Hitachi about whether these are needed for certain environments as we have none set at the moment in ours but it is here for reference

Install the Hitachi SRA – Hitachi_Raid_Manager_SRA_Ver02.03.01.zip

  • Extract the installer from the zip – HITACHI_RMHTCSRA_X64-02-03-01.exe
  • Run as Administrator
  • Accept the License Agreement
  • Choose a destination. I had to change my path to the D Drive as this is where my SRM installation is located
  • Click Next and Install
  • Restart the VMware Site Recovery Manager Service on the Protected STM Server
  • Install the Hitachi SRA software on the Recovery SRM server
  • Restart the VMware Site Recovery Manager Service on the Recovery SRM Server

Find the Command Device Name and Array Serial number on each SRM Server

First we need to find the Command Device Name and the serial number of the array on each SRM Server

  • On the Primary SRM Server, open an elevated command prompt and navigate to the horcm\etc folder on D:
  • Run the following command to identify the arrays cmddev name and the serial number
  • raidscan -x findcmddev hdisk0,100
  • The primary array serial number is 415068
  • The command device is \\.\PhysicalDrive2
  • On the Secondary SRM Server, open an elevated command prompt and navigate to the horcm\etc folder on D:
  • Run the following command to identify the arrays cmddev name and the serial number
  • raidscan -x findcmddev hdisk0,100
  • The primary array serial number is 415073
  • The command device is \\.\PhysicalDrive2

Add the details above to the HORCM100.conf on the Primary SRM Server and HORCM101.conf file on the Secondary SRM Server

  • At the top of the HORCM100.conf file we put in the serial number of the array as it makes it easier for us to liaise with Support and Storage if we have an issue, but it is not mandatory
  • In HORCM_MON we add the IP address of the Primary SRM server and the serial number of the Primary storage array
  • In HORCM_CMD, we put in the command device which is \\.\PhysicalDrive2
  • Note: A lot of info is already there but I will talk through these as we go.
  • At the top of the HORCM101.conf file we put in the serial number of the array as it makes it easier for us to liaise with Support and Storage if we have an issue, but it is not mandatory
  • In HORCM_MON we add the IP address of the Secondary SRM server and the serial number of the Primary storage array
  • In HORCM_CMD, we put in the command device which is \\.\PhysicalDrive2

Configure the opposite details for each site within the HORCM100.conf file on the Primary SRM server and the HORCM101.conf file on the Secondary SRM Server

  • Under the section HORCM_INST within the HORCM100.conf file, fill in the below details highlighted in yellow
  • Put in the IP address of the Secondary SRM server
  • Put in the name of the HORCM101.conf file on the Secondary SRM server
  • Under the section HORCM_INST within the HORCM101.conf file, fill in the below details highlighted in yellow
  • Put in the IP address of the Primary SRM server
  • Put in the name of the HORCM100.conf file on the Primary SRM server

Configure the HORCM100_run.txt on the Primary SRM Server and then HORCM101_run.txt file on the Secondary SRM Server

  • Navigate to D:\HORCM\Tool\HORCM100_run.txt
  • Set the below parameters highlighted in yellow below
  • Navigate to D:\HORCM\Tool\HORCM101_run.txt
  • Set the below parameters highlighted in yellow below

Run the following command from the tool folder on the Primary SRM Server and Secondary SRM Server

  • Run the following command from the tool folder on the Primary SRM Server and change the HORCM number to the one you are using
  • D:\HORCM\Tool>svcexe.exe /S=HORCM100 “/A=D:\HORCM\Tool\svcexe.exe”
  • Run the following command from the tool folder on the Secondary SRM Server and change the HORCM number to the one you are using
  • D:\HORCM\Tool>svcexe.exe /S=HORCM101 “/A=D:\HORCM\Tool\svcexe.exe

Add an Inbound Windows Firewall rule for Port 11088 on the Primary SRM Server and an Inbound Windows Firewall rule on the Secondary SRM Server

  • Go to Windows Firewall with Advanced Security
  • On Inbound rules, select new Rule
  • On Rule Type, select Port
  • Select UDP
  • Put in 11088
  • Select Allow the Connection
  • Untick Public
  • Put in a name HORCM100 In
  • Go to Windows Firewall with Advanced Security
  • On Inbound rules, select new Rule
  • On Rule Type, select Port
  • Select UDP
  • Put in 11089
  • Select Allow the Connection
  • Untick Public
  • Put in a name HORCM101 In

Update the Services file and start the service on the Primary SRM Server and the Secondary SRM Server

  • On the Primary Server, go to C:\Windows\System32\drivers\etc
  • Update the services file under c:\windows\system32\drivers\etc\services
  • Repeat the above on the Secondary SRM Server

Start the HORCM100 Service on the Primary SRM Server and start the HORCM101 service on the Secondary SRM Server

  • On the Primary SRM Server, click Start – Run – services.msc
  • Start HORCM100
  • On the Secondary SRM Server, click Start – Run – services.msc
  • Start HORCM101

Next we need to speak to the Storage Team and obtain our replicated LUNs then pair them

Note: You will be prompted for a username and password – Ask your storage admin to create one. Ours is called HORCM in the command below.

paircreate -I100 -vl -g VI-SPSP-DS-REP-0204-L5 -d VI-SPSP-DS-REP-0204-L5 -f never -c 15

paircreate -I100 -vl -g VI-SPSP-DS-REP-0205-L6 -d VI-SPSP-DS-REP-0205-L6 -f never -c 15

There is a very important note to add here, the –vl flag in the below commands tells the SAN to create the pairing based on the local HORCM instance that is referenced (100 in the case of the commands, as indicated by the –IH100 flag). What this means is that the local LDEV will become the Primary replication LDEV, with the LDEV in the other datacentre becoming the Secondary. So in this case because we have run the command from the PDC SRM server the replication will go from PDC > SDC, so the datastore in vCenter has to be created in PDC and will be replicated to SDC. With this in mind, it is vital that the pair creation commands are run from the correct SRM server, if the datastores are to be created in PDC then the pairs need to be created on the PDC SRM server. Otherwise the replication will be the wrong way around. After the pair create commands have been run, you can rerun the pair display commands to confirm the correct Primary and Secondary sites, this is discussed in more detail below.

Next Run a Pair display to make sure the LUNs are paired

The –g flag dictates which group will be checked (same as DEV_GROUP from HORCM file).

The –IH flag dictates which HORCM instance to query. The –fxc flags dictate which info will be shown be the command.

The –fxc flags dictate which info will be shown be the command.

Next steps – Log into vCenter and Site Recovery Manager

You will be on the Site pair page. You can also see the other 3 options

Click the issues to see if there are any problems

Next go to Array Based Replication and click on Storage Replication Adapters. Clcik both sites to make sure everything is ok

Click on Array Pairs and click Add

The Array pair wizard will open

For the name, enter Hitachi-ArrayManager

For the local protected HORCM site, enter HORCMINST=100 (100 is our HORCM instance on our protected site)

For the username and password, enter the credentials you have been given by your storage administrator.

In our case the username is horcm and then put in the password

For the remote recovery HORCM site, enter Hitachi-ArrayManager-Remote

For the remote recovery HORCM site, enter HORCMINST=101 (101 is our HORCM instance on our recovery site)

For the username and password, enter the credentials you have been given by your storage administrator.

In our case the username is horcm and then put in the password

The array pairs screen will then come up

Click Next and check the last screen and finish

You will now see the paired arrays

If you click on the Array pair, then below you will see the paired datastores

Next we will configure Network Mappings

Select the Recovery network

Check the Test networks. These are used instead of the recovery networks while running tests

Check the Ready to Complete page and click Finish

Next, we will go through Folder Mappings

Choose Prepare Mappings manually

Select the mappings on both sides and click Add

The mappings will look similar to the below screen-print

Select the Reverse mappings

Click Finish after checking the Final screen

Next go to Resource Mapping

Select the Cluster Resource

Select the Reverse mappings

Check the Final Page and click finish

Placeholder Datastores

When you create an array-based replication protection group that contains datastore groups or a vSphere Replication protection group that contains individual virtual machines, Site Recovery Manager creates a placeholder virtual machine at the recovery site for each of the virtual machines in the protection group.

A placeholder virtual machine is a subset of virtual machine files. Site Recovery Manager uses that subset of files to register a virtual machine with vCenter Server on the recovery site.

The files of the placeholder virtual machines are very small, and do not represent full copies of the protected virtual machines. The placeholder virtual machine does not have any disks attached to it. The placeholder virtual machine reserves compute resources on the recovery site, and provides the location in the vCenter Server inventory to which the protected virtual machine recovers when you run recovery.

The presence of placeholder virtual machines on the recovery site inventory provides a visual indication to vCenter Server administrators that the virtual machines are protected by Site Recovery Manager. The placeholders also indicate to vCenter Server administrators that the virtual machines can power on and start consuming local resources when Site Recovery Manager runs tests or runs a recovery plan.

When you recover a protected virtual machine by testing or running a recovery plan, Site Recovery Manager replaces the placeholder with the recovered virtual machine and powers it on according to the settings of the recovery plan. After a recovery plan test finishes, Site Recovery Manager restores the placeholders and powers off the recovered virtual machines as part of the cleanup process.

Go to Site Recovery Manager > Configure > Placeholder Datastores and click +New

Choose the datastore you created to be the Placeholder Datastore

You will then see the Placeholder Datastore added in SRM

Select the Placeholder Datastore

You will now see your Recovery Placeholder Datastore under the Recovery vCenter

Next we need to create a Protection Group

In SRM, protection groups are a way of grouping VMs that will be recovered together. A protection group contains VMs whose data has been replicated by either array-based replication (ABR) or vSphere replication (VR). A protection group cannot contain VMs replicated by more than one replication solution (eg. same VM protected by both vSphere replication and array-based replication) and, a VM can only belong to a single protection group.

How do Protection Groups fit into SRM?

Recovery Plans in SRM are like an automated run book, controlling all the steps in the recovery process. The recovery plan is the level at which actions like Failover, Planned Migration, Testing and Reprotect are conducted. A recovery plan contains one or more protection groups and a protection group can be included in more than one recovery plan. This provides for the flexibility to test or recover the email application by itself and also test or recover a group of applications or the entire site. Thanks to Kato Grace for this information and diagram below

Click New in the Protection Group screen

Fill in the necessary details and make sure you select the right direction

Select the type of replication (In this case we are using Datastore groups (array-based replication)

Click Next and choose the Datastore(s) you want to add to the Protection Group

Select whether you want to add the Protection Group to a Recovery Plan. For now I will say Do not add as we will go through a Recovery Plan next

Check the Ready to Complete screen and make sure everything is as expected. Click Finish.

You will then be back to the Protection Group page which looks like the following

If you click on the Protection Group, you will see all the details. Check any issues and have a look through the tabs to check everything looks as expected.

Next we will set up a Recovery Plan. Click on the Recovery Plan tab and click New

Put in a Name, Description, Direction and Location

Choose your Protection Group(s)

Leave everything as it is in the Test networks Screen

Click Next and on the Ready to Complete screen, check the details and click Finish

Click on the Recovery Plan tab and then on your previously created Recovery Plan

Installing SolarWinds TFTP server

Where to download Solarwinds TFTP Server

https://www.solarwinds.com/free-tools/free-tftp-server

Installation and Configuration

  • Before Installing Solarwinds TFTP Server, it will prompt you if you haven’t to install .NET Framework 3.5. You may already have it enabled or the installer will try and locate and install it for you. The other option is to install it via the Roles and Features option on the Windows Server. For reference I am using a Windows 2012 R2 server.
  • Right click on the installer and Run as Administrator
  • Accept the License Agreement and click Next
  • Click Install
  • Click Finish
  • Open the SolarWinds TFTP Server
  • Click File > Configure
  • The below screen will come up. Make a note of the TFTP server root directory.
  • Other screens look like the below. Server Bindings
  • Security
  • Language Screen
  • You may need to modify the Windows firewall with a rule to allow inbound traffic port 69 UDP for TFTP.
  • You now need to download the TFTP Boot Zip file and unzip it into your TFTP folder which here is c:\TFTP-Root

Taking a look at AutoDeploy in vSphere 6.5U2

What is AutoDeploy?

vSphere Auto Deploy lets you provision hundreds of physical hosts with ESXi software.

Using Auto Deploy, administrators can manage large deployments efficiently. Hosts are network-booted from a central Auto Deploy server. Optionally, hosts are configured with a host profile of a reference host. The host profile can be set up to prompt the user for input. After boot up and configuration complete, the hosts are managed by vCenter Server just like other ESXi hosts.

Types of AutoDeploy Install

Auto Deploy can also be used for Stateless caching or Stateful install. There are several more options than there were before which are shown below in a screen-print from a host profile.

What is stateless caching?

Stateless caching addresses this by caching the ESXi image on the host’s local storage. If AutoDeploy is unavailable then the host will boot from its local cached image. There are a few things that need to be in place before stateless caching can be enabled:

  • Hosts should be set to boot from network first, and local disk second
  • Ensure that there is a disk with at least 1 GB available
  • The host should be set up to get it’s settings from a Host Profile as part of the AutoDeploy rule set.

To configure a host to use stateless caching, the host profile that it will receive needs to be updated with the relevant settings. To do so, edit the host profile, and navigate to the ‘System Image Cache Profile Settings’ section, and change the drop-down menu to ‘Enable stateless caching on the host’

Stateless caching can be seen in the below diagram

What is Stateful Caching?

It is also possible to have AutoDeploy install ESXi. When the host first boots it will pull the image from the AutoDeploy server, then on all subsequent restarts the host will boot from the locally installed image, just as with a manually built host. With stateful installs, ensure that the host is set to boot from disk firstly, followed by network boot.

AutoDeploy stateful installs are configured in the same way as stateless caching. Edit the host profile, this time changing the option to ‘Enable stateful installs on the host’:

AutoDeploy Architecture

Pre-requisites

A vSphere Auto Deploy infrastructure will contain the below components

  • vSphere vCenter Server – vSphere 6.7U1 is the best and most comprehensive option to date.
  • A DHCP server to assign IP addresses and TFTP details to hosts on boot up – Windows Server DHCP will do just fine.
  • A TFTP server to serve the iPXE boot loader
  • An ESXi offline bundle image – Download from my.vmware.com.
  • A host profile to configure and customize provisioned hosts – Use the vSphere Web Client.
  • ESXi hosts with PXE enabled network cards 

1.VMware AutoDeploy Server

  • Serves images and host profiles to ESXi hosts.
  • vSphere Auto Deploy rules engine
  • Sends information to the vSphere Auto Deploy server which image profile and which host profile to serve to which host. Administrators use vSphere Auto Deploy to define the rules that assign image profiles and host profiles to host

2. Image Profile Server

Define the set of VIBs to boot ESXi hosts with.

  • VMware and VMware partners make image profiles and VIBs available in public depots. Use vSphere ESXi Image Builder to examine the depot and use the vSphere Auto Deploy rules engine to specify which image profile to assign to which host.
  • VMware customers can create a custom image profile based on the public image profiles and VIBs in the depot and apply that image profile to the host

3. Host Profiles

Define machine-specific configuration such as networking or storage setup. Use the host profile UI to create host profiles. You can create a host profile for a reference host and apply that host profile to other hosts in your environment for a consistent configuration

4. Host customization

Stores information that the user provides when host profiles are applied to the host. Host customization might contain an IP address or other information that the user supplied for that host. For more information about host customizations, see the vSphere Host Profiles documentation.

Host customization was called answer file in earlier releases of vSphere Auto Deploy

5. Rules and Rule Sets

Rules

Rules can assign image profiles and host profiles to a set of hosts, or specify the location (folder or cluster) of a host on the target vCenter Server system. A rule can identify target hosts by boot MAC address, SMBIOS information, BIOS UUID, Vendor, Model, or fixed DHCP IP address. In most cases, rules apply to multiple hosts. You create rules by using the vSphere Client or vSphere Auto Deploy cmdlets in a PowerCLI session. After you create a rule, you must add it to a rule set. Only two rule sets, the active rule set and the working rule set, are supported. A rule can belong to both sets, the default, or only to the working rule set. After you add a rule to a rule set, you can no longer change the rule. Instead, you copy the rule and replace items or patterns in the copy. If you are managing vSphere Auto Deploy with the vSphere Client, you can edit a rule if it is in inactive state

You can specify the following parameters in a rule.

Active Rule Set

When a newly started host contacts the vSphere Auto Deploy server with a request for an image profile, the vSphere Auto Deploy server checks the active rule set for matching rules. The image profile, host profile, vCenter Server inventory location, and script object that are mapped by matching rules are then used to boot the host. If more than one item of the same type is mapped by the rules, the vSphere Auto Deploy server uses the item that is first in the rule set.

Working Rule Set

The working rule set allows you to test changes to rules before making the changes active. For example, you can use vSphere Auto Deploy cmdlets for testing compliance with the working rule set. The test verifies that hosts managed by a vCenter Server system are following the rules in the working rule set. By default, cmdlets add the rule to the working rule set and activate the rules. Use the NoActivate parameter to add a rule only to the working rule set.

You use the following workflow with rules and rule sets.

  1. Make changes to the working rule set.
  2. Test the working rule set rules against a host to make sure that everything is working correctly.
  3. Refine and retest the rules in the working rule set.
  4. Activate the rules in the working rule set.If you add a rule in a PowerCLI session and do not specify the NoActivate parameter, all rules that are currently in the working rule set are activated. You cannot activate individual rules

AutoDeploy Boot Process

The boot process is different for hosts that have not yet been provisioned with vSphere Auto Deploy (first boot) and for hosts that have been provisioned with vSphere Auto Deploy and added to a vCenter Server system (subsequent boot).

First Boot Prerequisites

Before a first boot process, you must set up your system. .

  • Set up a DHCP server that assigns an IP address to each host upon startup and that points the host to the TFTP server to download the iPXE boot loader from.
  • If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.
  • Identify an image profile to be used in one of the following ways.
    • Choose an ESXi image profile in a public depot.
    • Create a custom image profile by using vSphere ESXi Image Builder, and place the image profile in a depot that the vSphere Auto Deploy server can access. The image profile must include a base ESXi VIB.
  • If you have a reference host in your environment, export the host profile of the reference host and define a rule that applies the host profile to one or more hosts.
  • Specify rules for the deployment of the host and add the rules to the active rule set.

First Boot Overview

When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.

When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.

  1. When the administrator turns on a host, the host starts a PXE boot sequence.The DHCP Server assigns an IP address to the host and instructs the host to contact the TFTP server.
  2. The host contacts the TFTP server and downloads the iPXE file (executable boot loader) and an iPXE configuration file.
  3. iPXE starts executing.The configuration file instructs the host to make a HTTP boot request to the vSphere Auto Deploy server. The HTTP request includes hardware and network information.
  4. In response, the vSphere Auto Deploy server performs these tasks:
    1. Queries the rules engine for information about the host.
    2. Streams the components specified in the image profile, the optional host profile, and optional vCenter Server location information.
  5. The host boots using the image profile.If the vSphere Auto Deploy server provided a host profile, the host profile is applied to the host.
  6. vSphere Auto Deploy adds the host to thevCenter Server system that vSphere Auto Deploy is registered with.
    1. If a rule specifies a target folder or cluster on the vCenter Server system, the host is placed in that folder or cluster. The target folder must be under a data center.
    2. If no rule exists that specifies a vCenter Server inventory location, vSphere Auto Deploy adds the host to the first datacenter displayed in the vSphere Client UI.
  7. If the host profile requires the user to specify certain information, such as a static IP address, the host is placed in maintenance mode when the host is added to the vCenter Server system.You must reapply the host profile and update the host customization to have the host exit maintenance mode. When you update the host customization, answer any questions when prompted.
  8. If the host is part of a DRS cluster, virtual machines from other hosts might be migrated to the host after the host has successfully been added to the vCenter Server system.

Subsequent Boots Without Updates

For hosts that are provisioned with vSphere Auto Deploy and managed by avCenter Server system, subsequent boots can become completely automatic.

  1. The administrator reboots the host.
  2. As the host boots up, vSphere Auto Deploy provisions the host with its image profile and host profile.
  3. Virtual machines are brought up or migrated to the host based on the settings of the host.
    • Standalone host. Virtual machines are powered on according to autostart rules defined on the host.
    • DRS cluster host. Virtual machines that were successfully migrated to other hosts stay there. Virtual machines for which no host had enough resources are registered to the rebooted host.

If the vCenter Server system is unavailable, the host contacts the vSphere Auto Deploy server and is provisioned with an image profile. The host continues to contact the vSphere Auto Deploy server until vSphere Auto Deploy reconnects to the vCenter Server system.

vSphere Auto Deploy cannot set up vSphere distributed switches if vCenter Server is unavailable, and virtual machines are assigned to hosts only if they participate in an HA cluster. Until the host is reconnected to vCenter Server and the host profile is applied, the switch cannot be created. Because the host is in maintenance mode, virtual machines cannot start.

Important: Any hosts that are set up to require user input are placed in maintenance mode

Subsequent Boots With Updates

You can change the image profile, host profile, vCenter Server location, or script bundle for hosts. The process includes changing rules and testing and repairing the host’s rule compliance.

  1. The administrator uses the Copy-DeployRule PowerCLI cmdlet to copy and edit one or more rules and updates the rule set. .
  2. The administrator runs the Test-DeployRulesetCompliance cmdlet to check whether each host is using the information that the current rule set specifies.
  3. The host returns a PowerCLI object that encapsulates compliance information.
  4. The administrator runs the Repair-DeployRulesetCompliance cmdlet to update the image profile, host profile, or vCenter Server location the vCenter Server system stores for each host.
  5. When the host reboots, it uses the updated image profile, host profile, vCenter Server location, or script bundle for the host.If the host profile is set up to request user input, the host is placed in maintenance mode

Note: Do not change the boot configuration parameters to avoid problems with your distributed switch

Prepare your system for AutoDeploy

Before you can PXE boot an ESXi host with vSphere Auto Deploy, you must install prerequisite software and set up the DHCP and TFTP servers that vSphere Auto Deploy interacts with.

Prerequisites

  • Verify that the hosts that you plan to provision with vSphere Auto Deploy meet the hardware requirements for ESXi. See ESXi Hardware Requirements.
  • Verify that the ESXi hosts have network connectivity to vCenter Server and that all port requirements are met. See vCenter Server Upgrade.
  • Verify that you have a TFTP server and a DHCP server in your environment to send files and assign network addresses to the ESXi hosts that Auto Deploy provisions.
  • Verify that the ESXi hosts have network connectivity to DHCP, TFTP, and vSphere Auto Deploy servers.
  • If you want to use VLANs in your vSphere Auto Deploy environment, you must set up the end to end networking properly. When the host is PXE booting, the firmware driver must be set up to tag the frames with proper VLAN IDs. You must do this set up manually by making the correct changes in the UEFI/BIOS interface. You must also correctly configure the ESXi port groups with the correct VLAN IDs. Ask your network administrator how VLAN IDs are used in your environment.
  • Verify that you have enough storage for the vSphere Auto Deploy repository. The vSphere Auto Deploy server uses the repository to store data it needs, including the rules and rule sets you create and the VIBs and image profiles that you specify in your rules.Best practice is to allocate 2 GB to have enough room for four image profiles and some extra space. Each image profile requires approximately 350 MB. Determine how much space to reserve for the vSphere Auto Deploy repository by considering how many image profiles you expect to use.
  • Obtain administrative privileges to the DHCP server that manages the network segment you want to boot from. You can use a DHCP server already in your environment, or install a DHCP server. For your vSphere Auto Deploy setup, replace the gpxelinux.0 filename with snponly64.efi.vmw-hardwired for UEFI or undionly.kpxe.vmw-hardwired for BIOS.
  • Secure your network as you would for any other PXE-based deployment method. vSphere Auto Deploy transfers data over SSL to prevent casual interference and snooping. However, the authenticity of the client or the vSphere Auto Deploy server is not checked during a PXE boot.
  • If you want to manage vSphere Auto Deploy with PowerCLI cmdlets, verify that Microsoft .NET Framework 4.5 or 4.5.x and Windows PowerShell 3.0 or 4.0 are installed on a Windows machine. You can install PowerCLI on the Windows system on which vCenter Server is installed or on a different Windows system. See the vSphere PowerCLI User’s Guide.
  • Set up a remote Syslog server. See the vCenter Server and Host Management documentation for Syslog server configuration information. Configure the first host you boot to use the remote Syslog server and apply that host’s host profile to all other target hosts. Optionally, install and use the vSphere Syslog Collector, a vCenter Server support tool that provides a unified architecture for system logging and enables network logging and combining of logs from multiple hosts.
  • Install ESXi Dump Collector, set up your first host so that all core dumps are directed to ESXi Dump Collector, and apply the host profile from that host to all other hosts.
  • If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.

Starting to configure AutoDeploy

Step 1 – Enable the AutoDeploy, Image Builder Service and Dump Collector Service

  • Install vCenter Server or deploy the vCenter Server Appliance.The vSphere Auto Deploy server is included with the management node.
  • Configure the vSphere Auto Deploy service startup type.
  • On the vSphere Web Client Home page, click Administration.
  • Under System Configuration, click Services
This image has an empty alt attribute; its file name is Autodeploy-7-1024x487.png
This image has an empty alt attribute; its file name is Autodeploy-8-1024x335.png
  • Select Auto Deploy, click the Actions menu, and select Edit Startup Type and select Automatic
  • (Optional) If you want to manage vSphere Auto Deploy with the vSphere Web Client, configure the vSphere ESXi Image Builder service startup type
  • Check the Startup
  • Log out of the vSphere Web Client and log in again.The Auto Deploy icon is visible on the Home page of the vSphere Web Client
  • Enable the Dump Collector
  • You can now either set the dump collector manually on each host or configure the host profile with the settings
  • If you want to enter it manually and point the dump collector to the vCenter then the following commands are used
  • esxcli system coredump network set –interface-name vmk0 –server-ipv4 10.242.217.11 –server-port 6500
  • esxcli system coredump network set –enable true
  • Enable Automatic Startup

Step 2 Configure the TFTP server

There are different options here. Some people use Solarwinds or there is the option now to use an inbuilt TFTP service on the vCenter

Important: The TFTP service in vCenter is only supported for dev and test environments, not production and will be coming out of future releases of vCenter. It is best to have a separate TFTP server.

Instructions

  • Now that Auto Deploy is enabled we can configure the TFTP server. Enable SSH on the VCSA by browsing to the Appliance Management page: https://VCSA:5480 where VCSA is the IP or FQDN of your appliance.
  • Log in as the root account. From the Access page enable SSH Login and Bash Shell.
  • SSH onto the vCenter Appliance, using a client such as Putty, and log in with the root account. First type shell and hit enter to launch Bash.
  • To start the TFTP service enter service atftpd start
  • Check the service is started using service atftpd status
  • To allow TFTP traffic through the firewall on port 69; we must run the following command. (Note double dashes in front of dport)
  • iptables -A port_filter -p udp -m udp –dport 69 -j ACCEPT
  • Validate traffic is being accepted over port 69 using the following command
  •  iptables -nL | grep 69
  • iptables can be found in /etc/systemd/scripts just for reference
  • Validate traffic is being accepted over port 69 using iptables -nL | grep 69
  • Type chkconfig atftpd on
  • To make the iptables rules persistent is to load them after a reboot from a script.
  • Save the current active rules to a file

iptables-save > /etc/iptables.rules

  • Next create the below script and call it starttftp.sh
#! /bin/sh 
#
# TFTP Start/Stop the TFTP service and allow port 69
#
# chkconfig: 345 80 05
# description: atftpd
### BEGIN INIT INFO
# Provides: atftpd
# Required-Start: $local_fs $remote_fs $network
# Required-Stop:
# Default-Start: 3 5
# Default-Stop: 0 1 2 6
# Description: TFTP
### END INIT INFO
service atftpd start
iptables-restore -c < /etc/iptables.rules
  • Put the starttftp.sh script in /etc/init.d via WinSCP
  • Put full permissions on the script
  • This should execute the command and reload the firewall tables after the system is rebooted
  • Reboot the vCenter appliance to test the script is running. If successful the atftpd service will be started and port 69 allowed, you can check these with service atftpd status and iptables -nL | grep 69.
  • Your TFTP directory is located at /tftpboot/
  • The TRAMP file on the vCenter must also now be modified and the DNS name removed and replaced with the IP address of the vCenter. Auto Deploy will not work without doing this part
  • The directory already contains the necessary files for Auto Deploy (tramp file, undionly.kpxe.vmw-hardwired, etc) Normally if you use Solarwinds TFTP server, you would need to download the TFTP Boot Zip and extract the files into the TFTP Root folder
  • Note there may be an issue with downloading this file due to security restrictions being enabled by some of the well known browsers – This is the likely message seen below. You may have to modify a browser setting in order to access the file
  • If everything is ok then you’ll be able to download it but note again, you do not need to download this if you are using the inbuilt TFTP server in vCenter as the files are already there.

Step 3 – Setting up DHCP options

  • The DHCP server assigns an IP address to the ESXi host when the host boots. The DHCP server also provides two required options to point the host to the TFTP server and to the boot files necessary for vSphere Auto Deploy to work. These additional DHCP options are as follows:
  • 066 – Boot Server Host Name – This option must be enabled, and the IP address of the server running TFTP should be inserted into the data entry field.
  • 067 – Bootfile Name –The “BIOS DHCP File Name” found in the vSphere Auto Deploy settings of your vCenter Server must be used here. The file name is undionly.kpxe.vmw-hardwired.
  • Go to Server Options and click Configure Options

  • In the value for option 066 (next-server) enter the FQDN of the TFTP boot server. In my case my vCenter Server hosting the TFTP service
  • Select option 67 and type in undionly.kpxe.vmw-hardwired.The undionly.kpxe.vmw-hardwired iPXE binary will be used to boot the ESXi host
  • Note: if you were using UEFI, you would need to put snponly64.efi.vmw-hardwired
  • You should now see the two options in DHCP
  • Next we need to add a scope and reservations to this scope
  • Right click IPv4 and select New Scope
  • A wizard will pop up
  • Put in a name and description
  • Put in the network IP range and subnet mask for the scope. Note: I have 3 hosts for testing.
  • Ignore the next screen and click Next
  • Ignore the next screen and click Next
  • Click No to configure options afterwards
  • Click Finish
  • We now need to create a DHCP reservation for each target ESXi host
  • In the DHCP window, navigate to DHCP > hostname > IPv4 > Autodeploy Scope > Reservations.
  • Right-click Reservations and select New Reservation.
  • In the New Reservation window, specify a name, IP address, and the MAC address for one of the hosts. Do not include the colon (:) in the MAC address.
  • The initial installation and setup is now finished and we can now start with the next stage

Stage 4 Image Builder and AutoDeploy GUI

  • The next stage involves logging into myvmware.com and downloading an offline bundle of the version of ESXi you need
  • Go to Home > Autodeploy in vCenter and select Add a Software Depot
  • Click Software Depots and then click Import Software Depot and upload. 4 images are normally recommended space wise.
  • Once uploaded, click on the depot and you should see the below
  • And
  • If you click on an image, you get options above where you can clone or export to an iso for example

Stage 5 – Creating an Deploy Rule

  • A deploy rule gives you control over the deployment process since you can specify which image profile is rolled out and on which server. Once a rule is created, you can also Edit or Clone it. Once created, the rule has to be activated for it to apply. If rules are not activated, Auto Deploy will fail
  • Click on the Deploy Rules tab and add a name
  • Next we want to select hosts that match the following pattern. There are multiple options
  • Asset
  • Domain
  • Gateway IPv4
  • Hostname
  • IPv4
  • IPv6
  • MAC address
  • Model
  • OEM string
  • Serial number
  • UUID
  • Vendor
  • I am going to use an IP range of my 3 hosts which is 192.168.1.100-192.168.1.102
  • Next Select an Image Profile
  • Select the ESXi image to deploy to the hosts, change the software depot from the drop down menu if needed, then click Next. If you have any issues with vib signatures you can skip the signature checks using the tick box.
  • Host Profile selection screen
  • Next Select a location
  • Next you will be on the Ready to Complete screen. Check the details and click Finish if you are happy
  • Note: The rule will be inactive – To use it, you will need to activate it but we will cover this in the next steps
  • The deploy rule is created but in an inactive state. Select the deploy rule and note the options; Activate / Deactivate, Clone, Edit, Delete. Click Activate / Deactivate, a new window will open. Select the newly created deploy rule and click ActivateNext, and Finish.
  • Now the deploy rule is activated; when you boot a host where the deploy rule is applicable you will see it load ESXi and the customization specified in the host profile. Deploy rules need to be deactivated before they can be edited.
  • You can setup multiple deploy rules using different images for different clusters or host variables. Hosts using an Auto Deploy ruleset are listed in the Deployed Hosts tab, hosts that didn’t match any deploy rules are listed under Discovered Hosts

Stage 6 – Reboot the ESXi host and see if the AutoDeploy deployment works as expected.

  • When you reboot a host, it will then come up as per the below screenprint
  • Once booted up, remediate the host
  • If you type in the following URL – https://<vCenter IP>:6502/vmw/rbd, it should take you to the Auto Deploy Debugging page where you can view registered hosts along with a detailed view of host and PXE information as well as the Auto Deploy Cache content

What do you do when you need to modify the Image Profile or Host Profile?

There are 2 commands you need to run to ensure the hosts can pick up the new data from the AutoDeploy rule whether it be a new image or a new/modified host profile. If you don’t run these, you will likely find that when you reboot your vSphere hosts they still boot from the old image”.

Test-DeploySetRuleCompliance <server-name>

Test-DeploySetRuleCompliance <server-name> | Repair-DeploySetRuleCompliance

This situation occurs when you update the active ruleset without updating the corresponding host entries in the auto deploy cache.  The first time a host boots the Auto Deploy server parses the host attributes against the active ruleset to determine (a) The image profile, (b) The host profile, and (c) the location of the host in the vCenter inventory.  This information then gets saved in the auto deploy cache and reused on all future reboots.  The strategy behind saving this information is to reduce the load on the auto deploy server by eliminating the need to parse each host against the rules engine on every reboot.  With this approach each host only gets parsed against the active ruleset once (on the initial boot) after which the results  get saved and reused on all subsequent reboots.

However, anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location.  When you make changes not only do you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts.  This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet.

Use the “Test-DeployRuleSetCompliance” cmdlet to check if the host information saved on the auto deploy server is up-to-date.  This cmdlet parses the host attributes against the active ruleset and compares the results with the information saved in the cache.  If the saved information is incorrect (i.e. out of compliance) the cmdlet will return a status of “Non-Compliant” and show what needs to be updated.  If the information in the cache is correct, then the command will simply return an empty string.

Thanks to Kyle Gleed for his blog on the above

https://blogs.vmware.com/vsphere/2012/11/auto-deploy-host-booting-from-wrong-image-profile.html

Steps to test the DeployRuleSetCompliance

  • Connect to vCenter through Putty
  • In order to check one host, we can use Test-DeployRuleSetCompliance lg-spsp-cex03.lseg.stockex.local. it will tell us it is non-compliant
  • In order to repair a single host to do a test we can use the below piped command. If you get an empty string back then the cache is not correct and ready to use the new image
  • Test-DeployRuleSetCompliance lg-spsp-cex03.lseg.stockex.local | Repair-DeployRuleSetCompliance
  • However, if we want to be clever about this because we have a lot of hosts, then we can run a quick simple PowerCLI “foreach” loop so we don’t have to update one host at a time
  • foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}
  • At this point, I would now start the TFTP service on the vCenter. Note: If you are using Solarwinds, this not necessary! Unless you want to double check it is all ok first.
  • Next Reboot the hosts and check they come up as the right version, example of our environment below pre and post remediation

Other issues we faced!

Issue 1 – TFTP Service on the vCenter

We used the TFTP service which was inbuilt to the vCenter. What you will find if you use this is that it will start but then it will automatically stop itself after a while which is fine. It’s just a case of remembering to start it. I found that with our HPE hosts, even after modifying the AutoDeploy rule and running the TestDeploy and RepairDeploy rules, that it was still booting from cache. In the ILO screen, you could see it picking up a DHCP address and the DHCP service passing the TFTP server to the host but then it timed out, Once the service was started on the vCenter it was fine.

service atftpd start

service atftp status

Note: Apparently VMware do not support the inbuilt vCenter Service so when we asked how we could keep the service running, we were told they wouldn’t help with it. So probably best to install something like Solarwinds which will keep the service running continuously.

Issue 2 – HPE Oneview Setting for PXE Boot

We found that with HPE BL460 Blades with SSD cards in, sometimes an empty host would boot up and lock a partition. This resulted in the host profile not being able to be applied, settings all over the place and there was absolutely no way of getting round it. We could only resolve it by using gparted to wipe the disk and boot again. There seemed to be no logic though as 5 out of 10 fresh hosts would boot up fine and 5 would not and lock the partition.

This what you would see if you hover over the error in vCenter

A dive into Host Profiles on vSphere 6.5

Host Profiles

As virtual infrastructures grow, it can become increasingly difficult and time consuming to configure multiple hosts in similar ways. Existing per-host processes typically involve repetitive and error-prone configuration steps. As a result, maintaining configuration consistency and correctness across the datacenter requires increasing amounts of time and expertise, leading to increased operational costs. Host Profiles eliminates per-host, manual or UI-based host configuration and maintains configuration consistency and correctness across the datacenter by using Host Profiles policies. These policies capture the blueprint of a known, validated reference host configuration, including the networking, storage, security and other settings.

You can then use this profile to:

• Automate host configuration across a large number of hosts and clusters. You can use Host Profiles to simplify the host provisioning process, configure multiple hosts in a similar way, and reduce the time spent on configuring and deploying new VMware ESX/ESXi hosts.

• Monitor for host configuration errors and deviations. You can use Host Profiles to monitor for host configuration changes, detect errors in host configuration, and ensure that the hosts are brought back into a compliant state. With Host Profiles, the time required to set up, change, audit and troubleshoot configurations drops dramatically due to centralized configuration and compliance checking. Not only does it reduce labor costs, but it also minimizes risk of downtime for applications/ virtual machines provisioned to misconfigured systems.

Accessing Host Profiles

Click Home > Host Profiles

You should see the below

What can we do with Host Profiles?

  1. Create a Host Profile
  2. Edit a Host Profile
  3. Extract a Host Profile from a host
  4. Attach a Host Profile to a host or cluster
  5. Check compliance
  6. Remediate a host
  7. Duplicate a Host Profile
  8. Copy settings from a host – If the configuration of the reference host changes, you can update the Host Profile so that it matches the reference host’s new configuration
  9. Import a Host Profile – .vpf
  10. Export a Host Profile – .vpf

Steps to create a profile

Host Profiles automates host configuration and ensures compliance in four steps: 1.

Step 1: Create a profile, using the designated reference host. To create a host profile, VMware vCenter Server retrieves and encapsulates the configuration settings of an existing VMware ESX/ESXi host into a description that can be used as a template for configuring other hosts. These settings are stored in the VMware vCenter Server database and can be exported into the VMware profile format (.vpf).

Step 2: Attach a profile to a host or cluster. After you create a host profile, you can attach it to a particular host or cluster. This enables you to compare the configuration of a host against the appropriate host profile.

Step 3: Check the host’s compliance against a profile. Once a host profile is created and attached with a set of hosts or clusters, VMware vCenter Server monitors the configuration settings of the attached entities and detects any deviations from the specified “golden” configuration encapsulated by the host profile.

Step 4: Apply the host profile of the reference host to other hosts or clusters of hosts. If there is a deviation, VMware vCenter Server determines the configuration that applies to a host. To bring noncompliant hosts back to the desired state, the VMware vCenter Server Agent applies a host profile by passing host configuration change commands to the VMware ESX/ESXi host agent through the vSphere API

Steps to create a host profile

  1. In the Host Profiles view, click Extract Profile from a host

2. You should get a wizard pop up. Choose the vCenter followed by the host you want to extract the profile from

3. Put in a name and description

4. Ready to Complete

5. A Host profile will be created and appear in the Host Profiles section

6. Edit the settings of the Host Profile by right clicking on the profile and click Edit Settings

7. The Edit Host Profile screen will pop up

8. Click Next to get to the Settings screen

9. When you edit the Host profile you can expand the Host profiles configuration hierarchy to see the sub profile components that comprise the Host profile. These components are categorised by functional group or resource class to make it easier to find a particular parameter. Each subprofile component contains one or more attributes and parameters, along with the policies and compliance checks

10. You can also mark settings as favourites by clicking the yellow star. you can then click View > Favourites to simplify searching for settings.

11. For example we have a default shared Datastore for storing logs under their own unique name. This saves us time configuring it manually

12. Note: There is an important setting if you are using a host profile with AutoDeploy. It will dictate how ESXi is installed and how the install will work on future reboots. vSphere has introduced new options described below for deploying hosts. I will be doing a further blog about AutoDeploy using these settings

Stateless Caching

Upon provisioning, the ESXi image is written or cached to a host’s server local (internal) or USB disk. The option is particularly useful when multiple ESXi hosts are being provisioned concurrently so rather than saturate the network, ESXi is re-provisioned from a cached image from a local or USB disk. Problems can occur such as the below though.

a) If the vCenter Server is available but the vSphere Auto Deploy server is unavailable, hosts do not connect to the vCenter Server system automatically. You can manually connect the hosts to the vCenter Server, or wait until the vSphere Auto Deploy server is available again.

b) If both vCenter Server and vSphere Auto Deploy are unavailable, you can connect to each ESXi host by using the VMware Host Client, and add virtual machines to each host.

c) If vCenter Server is not available, vSphere DRS does not work. The vSphere Auto Deploy server cannot add hosts to the vCenter Server. You can connect to each ESXi host by using the VMware Host Client, and add virtual machines to each host.

d) If you make changes to your setup while connectivity is lost, the changes are lost when the connection to the vSphere Auto Deploy server is restored.

Stateful Install

When the host first boots it will pull the image from the AutoDeploy server, then on all subsequent restarts the host will boot from the locally installed image, just as with a manually built host. With stateful installs, ensure that the host is set to boot from disk firstly, followed by network boot.

13. Once we have finished customising our profile, we can save it then we need to attach it to our hosts

14. Click the Attach/Detach Hosts and Clusters button within Host Profiles. A wizard will appear. I’m just going to test one of my hosts first and select attach. Keep Skip Host Customization unticked as we will see where we get any missing information which needs entering.

15. You will likely get some host customization errors as I did where I needed to fill in a DNS name of my host and add a username and password to join the hosts to the domain.

16. Next click on the button to check host compliance

17. I can see that one of my hosts is not compliant so I will see what I need to adjust

18. So I double check all my settings and find that yes, there is a mismatch in the config for esxupdate in the firewall config and there are different values between hosts for syslog settings. I’ll check and adjust these and run the Check Host Compliance again.

19. Lo and behold, I now have 3 compliant hosts 🙂

Reference Host setup for Autodeploy

A well-designed reference host connects to all services such as syslog, NTP, and so on. The reference host setup might also include security, storage, networking, and ESXi Dump Collector. You can apply such a host’s setup to other hosts by using host profiles.

The exact setup of your reference host depends on your environment, but you might consider the following customization.

NTP Server Setup

When you collect logging information in large environments, you must make sure that log times are coordinated. Set up the reference host to use the NTP server in your environment that all hosts can share. You can specify an NTP server by running the vicfg-ntp command. You can start and stop the NTP service for a host with the vicfg-ntp command, or the vSphere Web Client.

Edit the Host profile with the settings for your NTP service

Syslog Server Setup

All ESXi hosts run a syslog service (vmsyslogd), which logs messages from the VMkernel and other system components to a file. You can specify the log host and manage the log location, rotation, size, and other attributes by running the esxcli system syslog vCLI command or by using the vSphere Web Client. Setting up logging on a remote host is especially important for hosts provisioned with vSphere Auto Deploy that have no local storage. You can optionally install the vSphere Syslog Collector to collect logs from all hosts.

Edit the Host profile with the below 2 settings

Core Dump Setup

You can set up your reference host to send core dumps to a shared SAN LUN, or you can enable ESXi Dump Collector in the vCenter appliance and configure the reference host to use ESXi Dump Collector. After setup is complete, VMkernel memory is sent to the specified network server when the system encounters a critical failure.

Turn on the Dump Collector service in vCenter

Configure the host profile to enable and point the host to the vCenter on port 6500

Security Setup

In most deployments, all hosts that you provision with vSphere Auto Deploy must have the same security settings. You can, for example, set up the firewall to allow certain services to access the ESXi system, set up the security configuration, user configuration, and user group configuration for the reference host with the vSphere Web Client or with vCLI commands. Security setup includes shared user access settings for all hosts. You can achieve unified user access by setting up your reference host to use Active Directory. See the vSphere Securitydocumentation.

vCenter 6.5U2c VCHA Error

 

 

 

 

Random problem when setting up vCenter HA

So this is an interesting one because I don’t have a solution but it is now working so I can only explain what happened. The 3 blades hosting the vCenter were HPE Proliant BL460c Gen10 servers. Once I reached the end of configuring a 6.5U2c vCenter for vCenter HA, I received the following error message.

So after going back and double checking typos and distributed switch and port group settings, everything looked fine but this error as you can see specifically mentioned a host vmh01. So i decided to run the VCHA wizard again which produced the same error but listed the second host


As I had 3 hosts in the cluster, I decided to run the wizard a third time which errored on the third host but running it a fourth time meant the VCHA setup ran perfectly and finished without any problems. There was no problem with the vDS or port groups or general networking.

The great thing about VCHA is that in this instance, it rolls everything back so you can simply start again. You might ask why I havent taken a snapshot – well it doesn’t allow you to do this! The rollback works very well, in fact 3 times in this scenario. Obviously not so good if you have hundreds of hosts 😀 A very strange problem where the NICs seemed to need a push before deciding to work however it did work in the end.

 

VMware Network Port Diagram v6.x

 

 

 

 

This is a short post because I came across a really useful link to a PDF document showing how vCenter and ESXi are connected. This shows what ports and what direction the ports travel which is really useful to understand the internal and external communication with explanations for each port.

Link

Downloadable PDF within the KB

https://kb.vmware.com/s/article/2131180

 

Installing vCenter HA – 6.5U2c

 

 

 

 

 

Installing vCenter HA

The vCenter High Availability architecture uses a three-node cluster to provide availability against multiple types of hardware and software failures. A vCenter HA cluster consists of one Active node that serves client requests, one Passive node to take the role of Active node in the event of failure, and one quorum node called the Witness node. Any Active and Passive node-based architecture that supports automatic failover relies on a quorum or a tie-breaking entity to solve the classic split-brain problem, which refers to data/availability
inconsistencies due to network failures within distributed systems maintaining replicated data. Traditional architectures use some form of shared storage to solve the split-brain problem. However, in order to support a vCenter HA cluster spanning multiple datacenters, our design does not assume a shared storage–based deployment. As a result, one node in the vCenter HA cluster is permanently designated as a quorum node, or a Witness node. The other two nodes in the cluster dynamically assume the roles of Active and Passive nodes.
vCenter Server availability is assured as long as there are two nodes running inside a cluster. However, a cluster is considered to be running in a degraded state if there are only two nodes in it. A subsequent failure in a degraded cluster means vCenter services are no longer available.

A vCenter Server appliance is stateful and requires a strong, consistent state for it to work correctly. The appliance state (configuration state or runtime state) is mainly composed of:

• Database data (stored in the embedded PostgreSQL database)
• Flat files (for example, configuration files).

The appliance state must be backed up in order for VCHA failover to work properly. For the state to be stored inside the PostgreSQL database, we use the PostgreSQL native replication mechanism to keep the database data of the primary and secondary in sync. For flat files, a Linux native solution, rsync, is used for replication.
Because the vCenter Server appliance requires strong consistency, it is a strong requirement to utilize a synchronous form of replication to replicate the appliance state from the Active node to the Passive node

 

 

 

 

 

 

 

 

 

 

 

Installing vCenter HA

  • Download the relevant vCenter HA iso from the VMware download page
  • Mount the iso from a workstation or server

 

 

 

 

 

 

 

 

  • We’ll now go through the process of installing the first vCenter Server. I have mounted the iso on my Windows 10 machine
  • Go to vcsa-ui-installer > win32 > installer.exe

 

 

 

 

 

 

 

 

  • Click Install

 

 

 

 

 

 

 

 

 

 

 

 

  • Click Next

 

 

 

 

 

 

 

 

 

 

 

 

  • Click Accept License Agreement

 

 

 

 

 

 

 

 

 

 

 

 

  • Select Embedded Platform Services Controller. Note you can deploy an external PSC. I am doing the install this way as I want to test the embedded linked mode functionality now available in 6.5U2+ between embedded platform services controllers (This will require the build of another vCenter HA with an embedded PSC which I’ll try and cover in an another blog)

 

 

 

 

 

 

 

 

 

 

 

 

  • Next put in the details for a vCenter or host as the deployment target

 

 

 

 

 

 

 

 

 

 

 

 

  • Select the Certificate

 

 

 

 

 

 

 

 

 

 

 

 

  • Put in an appliance, username and password for the new vCenter appliance

 

 

 

 

 

 

 

 

 

 

  • Choose the deployment size and the Storage Size. Click Next

 

 

 

 

 

 

 

 

 

 

  • Choose the datastore to locate the vCenter on. Note: I am running vSAN.

 

 

 

 

 

 

 

 

 

 

  • Configure network settings. Note: As I chose a host to deploy to, it does not give me any existing vDS port groups. I have chosen to deploy to a host rather than an existing vCenter as I am testing this for a Greenfield build at work which does not have any existing vCenters etc to start with, just hosts.
  • Note: It would be useful at the point to make sure you have entered the new vCenter name and IP address into DNS.

 

 

 

 

 

 

 

 

 

 

  • Check all the details are correct

 

 

 

 

 

 

 

 

 

 

  • Click Finish. It should now say Initializing and start deploying

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • You should see the appliance is being deployed.

 

  • When the deployment has finished, you should see this screen.

  • You can carry on with Step 2 at this point but I closed the wizard at this point and I’m now going to log in to my vCenter and configure the appliance settings on https://techlabvca002.techlab.local:5480
  • Click Set up vCenter Server Appliance

 

 

 

 

 

 

 

 

 

  •  Log in to the vCenter

 

 

 

 

 

 

 

 

 

  • The below screen will pop up. Click Next

 

 

 

 

 

 

 

 

 

  • Check all details
  • Put in time servers. I’m connected to the internet through my environment so I use some generic time servers
  • Enable SSH if you need to – can be turned off again after configuration for security.

 

 

 

 

 

 

 

 

 

  • Put in your own SSO configuration
  • Click Next

 

 

 

 

 

 

 

 

 

  • Select or unselect the CEIP

 

 

 

 

 

 

 

 

 

  • Check all the details and click Finish

 

 

 

 

 

 

 

 

 

  • A message will pop up

 

 

 

 

 

 

 

 

 

  • The vCenter Appliance will begin the final installation

 

 

 

 

 

 

 

 

  • When complete, you should see the following screen

 

 

 

 

 

 

 

 

 

  • You can now connect to the vCenter Appliance on the 5480 port and the Web Client

 

 

 

 

 

 

  • Note: at this point I actually changed to enable VCHA on my normal first built vCenter called techlabvca001 as I should have added my second vCenter into the same SSO domain as techlabvca001 but I actually set it up as a completely different vCenter so it wouldn’t let me enable VCHA in the way I set it up. Log into the vSphere Web Client for techlabvca001
  • Highlight vCenter
  • Click the Configure tab
  • Choose Basic

  • Put in the Active vCenters HA address and subnet mask
  • Choose a port group

 

 

 

 

 

 

 

 

 

  • Click Next
  • Select Advanced and change the IP settings to what you want

 

 

 

 

 

 

 

 

 

  • Passive Node

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • And the Witness Node

 

  • Click Next and you will be on the next screen which allows you to specify what location and datastores you can use to place the nodes

 

 

 

 

 

 

 

 

 

  • Click Edit on the Passive Node

 

 

 

 

 

 

 

 

 

  • Select the Compute Resource

 

 

 

 

 

 

 

 

 

  • Choose a datastore – In my case this will be my vSAN

 

 

 

 

 

 

 

 

 

  • Check the Compatibilty checks – In my case it is just notifying me about snapshots being lost when this created.

  • Next adjust the Witness settings – I am not going to go through them all again as they will be the same as the Passive node we just did.

 

 

 

 

 

 

 

 

 

  • Check the Management network and vCenter HA networks

 

 

 

 

 

 

 

 

 

  • Next and check the final details and click Finish

 

 

 

 

 

 

 

 

 

  • It will now say vCenter HA being deployed in the vSphere Web client

 

 

 

 

 

 

 

 

 

  • You should see a Peer machine and a Witness machine being deployed

 

 

 

 

 

 

 

 

  • Once complete you will see VCHA is enabled and you should see your Active vCenter, Passive vCenter and Witness

 

 

 

 

 

 

  • Click the Test Failover to check everything is working as expected

 

 

 

 

 

 

 

  • You can also place the HA Cluster in several modes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Updating SSL Certificates on vCenter and Platform Services Controllers

vCenter services use SSL to communicate securely with each other and with ESXi. SSL communications ensure data confidentiality and integrity. Data is protected and cannot be modified in transit without detection.

vCenter Server services such as the vSphere Web Client also use certificates for initial authentication to vCenter Single Sign-On. vCenter Single Sign-On provisions each set of services (solution user) with a SAML token that the solution user can authenticate with.

In vSphere 6.0 and later, the VMware Certificate Authority (VMCA) provisions each ESXi host and each vCenter Server service with a certificate that is signed by VMCA by default.

You can replace the existing certificates with new VMCA-signed certificates, make VMCA a subordinate CA, or replace all certificates with custom certificates. You have several options:

Requirements for imported certificates

  • Key size: 2048 bits or more (PEM encoded)
  • PEM format. VMware supports PKCS8 and PKCS1 (RSA keys). When you add keys to VECS, they are converted to PKCS8.
  • x509 version 3
  • SubjectAltName must contain DNS Name=machine_FQDN
  • CRT format
  • Contains the following Key Usages: Digital Signature, Key Encipherment.
  • Client Authentication and Server Authentication cannot be present under Enhanced Key Usage

VMCA does not support the following certificates

  1. Certificates with wildcards
  2. The algorithms md2WithRSAEncryption 1.2.840.113549.1.1.2, md5WithRSAEncryption 1.2.840.113549.1.1.4, and sha1WithRSAEncryption 1.2.840.113549.1.1.5 are not recommended.
  3. The algorithm RSASSA-PSS with OID 1.2.840.113549.1.1.10 is not supported.

The work required for setting up or updating your certificate infrastructure depends on the requirements in your environment, on whether you are performing a fresh install or an upgrade, and on whether you are considering ESXi or vCenter Server.

What is the VMCA?

The VMware Certificate Authority (VMCA) is the default root certificate authority introduced in vSphere 6.0 that supplies the certificates to ensure communication over SSL between vCenter Server components and ESXi nodes in the virtualized infrastructure.

The VMCA is included in the Platform Services Controller and provides certificates for

  • Solution users (Replacing Solution user certificates is not normally required by company policy)
  • Machines that have running services
  • ESXi hosts. An ESXi host gets a signed certificate, stored locally on the server, from the VMCA. For environments that require a different root authority, an administrator must change the option in vCenter to stop automatically provisioning VMCA certificates to hosts.

If you do not currently replace VMware certificates, your environment starts using VMCA-signed certificates instead of self-signed certificates.

What is the VECS?

VMware Endpoint Certificate Store (VECS) serves as a local (client-side) repository for certificates, private keys, and other certificate information that can be stored in a keystore. You can decide not to use VMCA as your certificate authority and certificate signer, but you must use VECS to store all vCenter certificates, keys, and so on. ESXi certificates are stored locally on each host and not in VECS. VECS runs on every embedded deployment, Platform Services Controller node, and management node and holds the keystores that contain the certificates and keys.

How does VMCA deal with certificates?

With VMCA you can deal with certificates in three different ways.

  • VMCA Default

VMCA uses a self-signed root certificate. It issues certificates to vCenter, ESXi, etc and manages these certificates. These certificates have a chain of trust that stops at the VMCA root certificate. VMCA is not a general-purpose CA and its use is limited to VMware components.

  • VMCA Enterprise

VMCA is used as a subordinate CA and is issued subordinate CA signing certificate. It can now issue certificates that trust up to the enterprise CA’s root certificate. If you have already issued certs using VMCA Default and replace VMCA’s root cert with a CA signing cert then all certificates issued will be regenerated and pushed out to the components

  • Custom

In this scenario VMCA is completely bypassed. This scenario is for those customers that want to issue and/or install their own certificates from their own internal PKI or third party signed certificates generated from an external PKi such as Verisign or GoDaddy. You will need to issue a cert for every component. All those certs (except for host certs) need to be installed into VECS.

In Default and Enterprise modes, VMCA certificates can be easily regenerated on demand.

Certificate Manager Tool

For vSphere 6, the procedure for installing certificates has changed. A new Certificate Manager Tool is shipped as part of vCenter for Windows and VCSA. The location is below

/usr/lib/vmware-vmca/bin/certificate-manager

Deployments

I’m going to use a custom deployment method to just change machine certs but not ESXi host or Solution certificates.

Hybrid Deployment

You can have VMCA supply some of the certificates, but use custom certificates for other parts of your infrastructure. For example, because solution user certificates are used only to authenticate to vCenter Single Sign-On, consider having VMCA provision those certificates. Replace the machine SSL certificates with custom certificates to secure all SSL traffic.

Company policy often does not allow intermediate CAs. For those cases, hybrid deployment is a good solution. It minimizes the number of certificates to replace and secures all traffic. The hybrid deployment leaves only internal traffic, that is, solution user traffic, to use the default VMCA-signed certificates

Where vSphere uses certificates

ESXi Certificates

  • Stored locally on each host in the etc/vmware/ssl directory
  • ESXi certificates are provisioned by VMCA by default when the host is first added to vCenter and the host reconnects, but you can use custom certificates instead

Machine SSL Certificates

  • The machine SSL certificate for each node is used to create an SSl socket on the server side
  • SSL clients connect to the SSL socket
  • Used for server verification and for secure communications such as HTTPS or LDAPS
  • Each node has its own machine SSL certificate. Nodes include vCenter, Platform Services Controller or embedded deployment instance
  • VMware products use standard X.509 version 3 (X.509v3) certificates to encrypt session information. Session information is sent over SSL between components.

The following services use the machine SSL certificate

  • The reverse proxy service on each Platform Services Controller node. SSL connections to individual vCenter services always go to the reverse proxy. Traffic does not go to the services themselves.
  • The vCenter service (vpxd) on management nodes and embedded nodes.
  • The VMware Directory Service (vmdir) on infrastructure nodes and embedded nodes.

Solution User Certificates

  • A solution user encapsulates one or more vCenter Server services. Each solution user must be authenticated to vCenter Single Sign-On. Solution users use certificates to authenticate to vCenter Single

Sign-On through SAML token exchange

  • A solution user presents the certificate to vCenter Single Sign-On when it first must authenticate, after a reboot, and after a timeout has elapsed. The timeout (Holder-of-Key Timeout) can be set from the vSphere Web Client or Platform Services Controller Web interface and defaults to 2592000 seconds (30 days).

The following solution user certificate stores are included in VECS on each management or embedded node.

Managing certificates with the vSphere Certificate Manager Utility

There are a few ways of managing certificates but I am going to run through the vSphere Certificate Manager Utility.

The vSphere Certificate Manager utility allows you to perform most certificate management tasks interactively from the command line. vSphere Certificate Manager prompts you for the task to perform, for certificate locations and other information as needed, and then stops and starts services and replaces certificates for you.

If you use vSphere Certificate Manager, you are not responsible for placing the certificates in VECS (VMware Endpoint Certificate Store) and you are not responsible for starting and stopping services.

Before you run vSphere Certificate Manager, be sure you understand the replacement process and procure the certificates that you want to use

Certificate Manager Utility Location

Procedure

  • First of all I need to create a template in my own internal Certificate Authority. I’m going to follow the below article with the steps and screenprints to show what I’m doing.

https://kb.vmware.com/s/article/211200

  1. Connecting to the CA server, you will be generating the certificates from through an RDP session.
  2. Click Start > Run, type certtmpl.msc, and click OK.
  3. In the Certificate Template Console, under Template Display Name, right-click Web Server and click Duplicate Template.

  • In the Duplicate Template window, select Windows Server 2003 Enterprise for backward compatibility.Note: If you have an encryption level higher than SHA1, select Windows Server 2008 Enterprise.

  • Click the General Tab
  • In the Template display name field, enter vSphere 6.x as the name of the new template

  1. Click the Extensions tab.
  2. Select Application Policies and click Edit.
  3. Select Server Authentication and click Remove, then OK.
  4. Note: If Client Authentication exists, remove this from Application Policies as well.

  • Select Key Usage and click Edit
  • Select the Signature is proof of origin (nonrepudiation) option. Leave all other options as default
  • Click OK.

  • Click on the the Subject Name tab.
  • Ensure that the Supply in the request option is selected.
  • Click OK to save the template.

Next: Adding a new template to certificate templates section in the article to make the newly created certificate template available

  • Connecting to the CA server, you will be generating the certificates from through an RDP session.
  • Click Start > Run, type certsrv.msc, and click OK.
  • In the left pane of the Certificate Console, if collapsed, expand the node by clicking the + icon.

  • Right-click Certificate Templates and click New > Certificate Template to Issue.
  • Locate vSphere 6.x or vSphere 6.x VMCA under the Name column.
  • Click OK.

  • You will then see the certificate template

Next: Create a folder on the VCSA for uploading and downloading certificates

  • WinSCP into the VCSAs/PSCs and create a folder that you can upload and download to. E.g. /tmp/machine_ssl

shell.set –enabled True

shell

chsh -s /bin/bash root

Generate the Certificate signing request

Note: If you have external PSCs then do these first before doing the Centers.

The Machine SSL certificate is the certificate you get when you open the vSphere Web Client in a web browser. It is used by the reverse proxy service on every management node, Platform Services Controller, and embedded deployment. You can replace the certificate on each node with a custom certificate.

  1. In Putty, Navigate to /usr/lib/vmware-vmca/bin/ and run ./certificate-manager
  2. Put in the administrator@vsphere.local account and password

  1. Select Option 1 to Replace Machine SSL certificate with Custom Certificate
  2. Put in the path to the /tmp/machine_ssl folder on the appliance

  • Enter all the relevant cert info
    • Output directory path: path where will be generated the private key and the request
    • Country: your country in two letters
    • Name: The FQDN of your vCSA
    • Organization: an organization name
    • OrgUnit: type the name of your unit
    • State: country name
    • Locality: your city
    • IPAddess: provide the vCSA IP address
    • Email: provide your E-mail address
    • Hostname: the FQDN of your vCSA
    • VMCA Name: the FQDN where is located your VMCA. Usually the vCSA FQDN

  • You will then see the generated csr and key in the /tmp/machine_ssl folder

  • Open the vmware_issued_csr.csr file and copy the contents

Next: Request a certificate from your CA.

The next step is to use the CSR to request a certificate from your internal Certificate Authority.

  1. Log in to the Microsoft CA certificate authority Web interface. By default, it is http://CA_Server_FQDN/CertSrv.
  2. Click the Request a certificate (.csr ) link.

  1. Click advanced certificate request.

  1. Click the Submit a certificate request by using a base-64-encoded CMC or PKCS #10 file, or submit a renewal request by using a base-64-encoded PKCS #7 file link.

  1. Open the certificate request (machine_ssl.csr) in a plain text editor and copy from —–BEGIN CERTIFICATE REQUEST—– to —–END CERTIFICATE REQUEST—– into the Saved Request box.

  • On the download page, Select “Base 64 encoded” and click on “Download Certificate”. The downloaded file will be called “certnew.cer”. Rename this to “machine_ssl.cer”
  • Go back to the download web page and click on “Download certificate chain” (ensuring that “Base 64 encoded” is still selected). The downloaded file will be called “certnew.pb7”. Rename this to “cachain.pb7”

  • We are now going to export the CA Root certificate from the cachain.pb7 files. Right-click on the cachain.pb7 file and select “Open”

  • Expand the list and click on the Certificates folder. Right-click on the CA root cert (techlab-TECHLABDCA001-CA in this example), select All Tasks…Export

Select Base 64 encoded

  • Save the file as root-64.cer

  • You should now have the machine_ssl.cer file and the root-64.cer file
  • Using WinSCP copy the machine_ssl.cer and root-64.cer certificate files to the VCSA.

  • Now that the files have been copied, open the Certificate Manager Utility and select Option 1, Replace Machine SSL certificate with Custom Certificate.

  • Provide the password to your administrator@vsphere.local account and select Option 2, “Import Custom Certificate(s) and key(s) to replace existing Machine SSL certificate”

  • You will be prompted for following files:
  • machine_ssl.cer
  • machine_ssl.key
  • root-64.cer

  • Type Y to begin the process
  • It will kick of the install

  • You should get a message to say that everything is completed

  • Now check to see if everything has gone to plan. One thing to remember before we start. Because the new Machine SSL cert has been issued by the CA on the domain controller, you may need to install the root-64.cer fle into the browser. Once done, close the browser and log into the vSphere Web Client.
  • Now open your vCenter login page and check the certificate being used to protect it

  • Now open your vCenter login page and check the certificate being used to protect it
  • You’ll see that the certificate has been verified by “techlab-TECHLABADC001-CA”. This is the CA running on the Windows domain controller.

vSAN Stretched Cluster networking

Image result for free black and white storage icon

vSAN Stretched Cluster networking

A vSAN Stretched Cluster is a specific configuration implemented in environments where disaster/downtime avoidance is a key requirement. Setting up a stretched cluster can be daunting. More in terms of the networking side than anything else. This blog isn’t meant to be chapter and verse on vSAN stretched clusters. It is meant to help anyone who is setting up the networking, static routes and ports required for a L2 and L3 implementation.

VMware vSAN Stretched Clusters with a Witness Host refers to a deployment where a user sets up a vSAN cluster with 2 active/active sites with an identical number of ESXi hosts distributed evenly between the two sites. The sites are connected via a high bandwidth/low latency link.

The third site hosting the vSAN Witness Host is connected to both of the active/active data-sites. This connectivity can be via low bandwidth/high latency links.

Each site is configured as a vSAN Fault Domain. The way to describe a vSAN Stretched Cluster configuration is X+Y+Z, where X is the number of ESXi hosts at data site A, Y is the number of ESXi hosts at data site B, and Z is the number of witness hosts at site C. Data sites are where virtual machines are deployed. The minimum supported configuration is 1+1+1(3 nodes). The maximum configuration is 15+15+1 (31 nodes). In vSAN Stretched Clusters, there is only one witness host in any configuration.

A virtual machine deployed on a vSAN Stretched Cluster will have one copy of its data on site A, a second copy of its data on site B and any witness components placed on the witness host in site C.

Types of networks

VMware recommends the following network types for Virtual SAN Stretched Cluster:

  • Management network: L2 stretched or L3 (routed) between all sites. Either option should both work fine. The choice is left up to the customer.
  • VM network: VMware recommends L2 stretched between data sites. In the event of a failure, the VMs will not require a new IP to work on the remote site
  • vMotion network: L2 stretched or L3 (routed) between data sites should both work fine. The choice is left up to the customer.
  • Virtual SAN network: VMware recommends L2 stretched between the two data sites and L3 (routed) network between the data sites and the witness site.

The major consideration when implementing this configuration is that each ESXi host comes with a default TCPIP stack, and as a result, only has a single default gateway. The default route is typically associated with the management network TCPIP stack. The solution to this issue is to use static routes. This allows an administrator to define a new routing entry indicating which path should be followed to reach a particular network. Static routes are needed between the data hosts and the witness host for the VSAN network, but they are not required for the data hosts on different sites to communicate to each other over the VSAN network. However, in the case of stretched clusters, it might also be necessary to add a static route from the vCenter server to reach the management network of the witness ESXi host if it is not routable, and similarly a static route may need to be added to the ESXi witness management network to reach the vCenter server. This is because the vCenter server will route all traffic via the default gateway.

vSAN Stretched Cluster Visio diagram

The below diagram is for referring to and below this, the static routes are listed so it is clear what needs to connect.

Static Routes

The recommended static routes are

  • Hosts on the Preferred Site have a static route added so that requests to reach the witness network on the Witness Site are routed out the vSAN VMkernel interface
  • Hosts on the Secondary Site have a static route added so that requests to reach the witness network on the Witness Site are routed out the vSAN VMkernel interface
  • The Witness Host on the Witness Site have static route added so that requests to reach the Preferred Site and Secondary Site are routed out the WitnessPg VMkernel interface

On each host on the Preferred and Secondary site

These were the manual routes added

  • esxcli network ip route ipv4 add -n 192.168.1.0/24-n vmk1 -g 172.31.216.1  (192.168.1.0 being the witness vsan network and 172.31.216.1 being the host vsan vmkernel address)
  • esxcli network ip route ipv4 list will show you the networking
  • vmkping -I vmk1 192.168.1.10 will confirm via ping that the network is reachable

On the witness

These were the manual routes added

  • esxcli network ip route ipv4 add -n 172.31.216.0/25 -n vmk1 -g 192.168.1.1 (172.31.216.0/25 being the host vsan vmkernel network and the gateway being the witness vsan vmkernel gateway)
  • esxcli network ip route ipv4 list will show you the networking
  • vmkping -I vmk1 172.31.216.10 will confirm via ping that the network is reachable

Port Requirements

Virtual SAN Clustering Service

12345, 23451 (UDP)

Virtual SAN Cluster Monitoring and Membership Directory Service. uses UDP-based IP multicast to establish cluster members and distribute Virtual SAN metadata to all cluster members. If disabled Virtual SAN does not work,

Virtual SAN Transport

2233 (TCP)

Virtual SAN reliable datagram transport. uses TCP and is used for Virtual SAN storage I/O. if disabled, Virtual SAN does not work 

vSANVP

8080 (TCP)

vSAN VASA Vendor Provider. Used by the Storage Management Service (SMS) that is part of vCenter to access information about Virtual SAN storage profiles, capabilities and compliance. If disabled, Virtual SAN Storage Profile Based Management does not work

Virtual SAN Unicast agent to witness 

12321 (UDP)

Self explanatory as needed for unicast from data nodes to witness.

vSAN Storage Hub

The link below is to the VMware Storage Hub which is the central location for all things vSAN including the vSAN stretched cluster guide which is exportable to PDF. Page 66/67 are relevant to networking/static routes.

https://storagehub.vmware.com/t/vmware-vsan/vsan-stretched-cluster-2-node-guide/network-design-considerations/