This error comes up on a 6.5U2 environment running on HP Proliant BL460c Gen9 Servers. Where it previously had fully compliant hosts, now when checking compliance of a host against a host profile, all you see is Host Compliance Unknown.
The Resolution
I’m still not entirely sure how we went from a state of Compliant hosts to Unknown but this is how we resolved it
Deleted the AutoDeploy Rule
Recreated the AutoDeploy Rule
Ran the below command. Anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location; you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts. This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet or running the single PowerCLI command below
foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}
Checked the status of the atftpd service in vCenter. Note: We are using the inbuilt TFTP server in vCenter however this is now not supported by VMware but we find it works just fine. Solarwinds is a good alternative.
service atftpd status
service atftpd start
Rebooted one host while checking the startup in the ILO console.
You may need to remediate your hosts after boot-up or if everything is absolutely configured correctly, it will simply boot up, add itself into the cluster and become compliant
Hitachi Storage Replication Adapter (SRA) is an interface that integrates
Hitachi storage systems and replication software with VMware® vCenter
SRM™ processes
What is the Hitachi CCI?
Hitachi’s remote and in-system replication software require CCI to manage the pairs. The adapter plug-in links CCI with Site Recovery Manager. There are two CCI components:
Command devices, which reside on the storage systems. CCI uses the command device as the interface to the storage system from the host. The command device accepts commands from the host and executes them on the storage system. The command device is a dedicated logical volume.
Hitachi Open Remote Copy Manager (HORCM), which resides on the CCI server. HORCM operates as a daemon process. When activated, HORCM refers to CCI configuration definition files, also located on the server. The HORCM instance communicates with the storage system and remote servers. HORCM definition files describe the storage systems, pair volumes, and data paths. When a user issues a command, CCI uses the information in the HORCM files to identify which volumes are the targets of the command. Two HORCM files are needed for each pair. One file describes the primary volumes (P-VOLs), which are also referred to as “protected volumes”, and the other describes the secondary volumes (S-VOLs), which are also referred to as “recovery volumes”.
VMware SRM and Hitachi Components
Installation Steps
Ask the Storage Team to present a 50MB LUN to the hosts. This will be the command device. Edit the settings of each Primary and Secondary SRM VM and add the 50MB LUN as an RDM. Log into each SRM VM and bring the disk online and initialised but not formatted
The storage team need to make sure the Command Device has the following settings on the Hitachi side or the HORCM service will not run correctly.
Go to the SRM installer and Run as Administrator
Select a language
Click Next
Click Next
Accept the License agreement
Check Prerequisites
Change the install directory if you want or leave it on the C Drive. we install ours on the D Drive
Put in the vCenter name if you have embedded vCenters followed by administrator@vsphere.local and the password
Select a vCenter Server to register to
Fill in the Site name
Fill in an email address
Fill in the IP address for the SRM server
Choose the Default Site Recovery Manager Plug-in identifier
Select what certificate to use. I have generated a PKCS#12 cert so I will use a signed certificate
Note: When I generated the certificate through OpenSSL, I specified a password which is what you will need to enter when adding the certificate – https://kb.vmware.com/s/article/2085644
The certificate will have a .p12 extension
Choose the embedded option as this now supports a full installation of SRM
Enter the details in the Embedded Database Configuration
Enter the Site Recovery Manager Service Account
Click Finish to start the installation
You will see the installer creating the SRM Database
And
When it finishes, it should show the below screen
If you log into the vCenter you should see the Site Recovery icon in the menu
If you click Home and select Recovery Manager, you will see the below screen.
If you click open Site Recovery at the moment, it will ask you to sign in with SSO credentials then it will say the below message. Leave it here while we move on to installing the Recovery SRM server
Now you need to repeat all the above install steps on the Recovery SRM Server
One the Recovery SRM is complete, log into vCenter, go to site Recovery Manager and click on new Site Pair
Enter the details of the First site and Second site
Click next and check the details
Click Finish to Pair the sites
Now you will see the below screen if it is successful
If you now click on the View Details screen, then you will see the full details come up for the two sites
Next we need to install the Hitachi Command Control Interface
Note: I have already copied the software
Right click on Setup and run as Administrator
Read the below text and click Next
The default installation drive is C:\HORCM. I’m installing everything on my D Drive so you’ll see the Destination folder as D:\HORCM
The installer will run and finish
Reboot the server
When the server has rebooted, verify the correct version of the CCI software is running on the system by executing the below command
D:/HORCM\etc> raidqry -h
Install the CCI software on the recovery SRM server, reboot and check the version as per the above steps
Next, You will need two HORCM configuration definition files to define the pair relationship: one file describes the primary volumes (P-VOLs) on the Protected SRM Server, the other file describes the secondary volumes (S-VOLs) on the Recovery SRM Server.
You will need to take a copy of the default HORCM.conf file which gets installed with CCI in D:\HORCM\etc and copy it and rename it and place it in D:\HORCM\conf – Note: Just for clarity, I have named the HORCM.conf file on the Protected Server HORCM100.conf and then I’ll rename the HORCM.conf file as HORCM101.conf on the Recovery SRM Server. They must be consecutive numbers
And the same on the Recovery site
Open up the HORCM100.conf file and have a look at how it is structured in Notepad. Wordpad seems to lose clarity. It is quite a large file full of information (Hitachi Documentation example below) You will find the file is much larger than this and can be cut down very simply to the below
Example HORCM0.conf file from the Hitachi SRA for VMware vCenter SRM deployment guide
HORCM_MON – Information for monitoring the HORCM instance. Includes the IP address of the primary server, HORCM instance or service, polling interval for monitoring paired volumes and timeout period for communication with the remote server.
HORCM_CMD – Command device from the protected storage system. Replace the number with the serial number of the primary storage system
HORCM_LDEV – #dev_group is the group name for the pairs. dev_name is the pair name (Example uses P_VOL_S_VOL). The serial number is the storage system’s serial number) . CU:LDEV(LDEV#) is the LDEV ID of the P-VOL. MU# is the mirror unit number. use MU#0-2 for ShadowImage, Thin Image and Copy-on-Write Snapshot. You do not need to specify MU# for TC, UR and GAD. If you want to specify MU# for TC, UR, and GAD, use MU#h0 for TC and MU#h0-h3 for UR and GAD.
HORCM_INST – #dev_group is the group name for the pairs. ip address is the network address of the remote SRM server. service is the remote HORCM instance
Example HORCM1.conf for the secondary site remote replication pair
HORCM_MON – Shows the IP address of the secondary server, HORCM instance or service, polling interval for monitoring paired volumes, and timeout period for communication with the remote server
HORCM_CMD – Shows the command device on the remote site. Note that the instance or service is increased from the primary instance by 1. Use the storage systems serial number
HORCM_LDEV – Shows the same group and device name for the pair as used in the primary site HORCM file. the second entry in this section is a group for the ShadowImage pair used for testing. the remote pair’s S-VOL is in the system pair’s P-VOL When using ShadowImage for the in-system pair, make sure that the MU number is set for the P-VOL.
HORCM_INST – Shows the pair’s group name and the IP address and service number of the primary host. the second entry in the system shows the secondary host address
The TC or UR group must be defined before the SI group.
The MU# (h0-h3) for UR and GAD devices must be specified.
The MU# for ShadowImage devices must be specified. If MU#1 or MU#2 are used, the environment variable RMSRATMU must be set
Here are the 2 files together so you can see how it all works
Do not edit the configuration definition file while CCI is running. Shut down CCI, edit the configuration file as needed, and then restart CCI. (horcmshutdowm) When you change the system configuration, it is required to shut down CCI once and rewrite the configuration definition file to match with the change and then restart CCI. (horcmstart) When you change the storage system configuration (microprogram, cache capacity, LU path, and so on), you must restart CCI regardless of the necessity of the configuration definition file editing. When you restart CCI, confirm that there is no contradiction in the connection configuration by using the “-c” option of the pairdisplay command and the raidqry command. However, you cannot confirm the consistency of the P-VOL and S-VOL capacity with the “-c” option of pairdisplay command. Confirm the capacity of each volume by using the raidcom command
The HORCM.conf file has set parameters as seen below
Environment variables
RMSRA20 requires that the following system environment variables be defined in order to make certain parameters available
Sometimes it may be worth speaking to Hitachi about whether these are needed for certain environments as we have none set at the moment in ours but it is here for reference
Install the Hitachi SRA – Hitachi_Raid_Manager_SRA_Ver02.03.01.zip
Extract the installer from the zip – HITACHI_RMHTCSRA_X64-02-03-01.exe
Run as Administrator
Accept the License Agreement
Choose a destination. I had to change my path to the D Drive as this is where my SRM installation is located
Click Next and Install
Restart the VMware Site Recovery Manager Service on the Protected STM Server
Install the Hitachi SRA software on the Recovery SRM server
Restart the VMware Site Recovery Manager Service on the Recovery SRM Server
Find the Command Device Name and Array Serial number on each SRM Server
First we need to find the Command Device Name and the serial number of the array on each SRM Server
On the Primary SRM Server, open an elevated command prompt and navigate to the horcm\etc folder on D:
Run the following command to identify the arrays cmddev name and the serial number
raidscan -x findcmddev hdisk0,100
The primary array serial number is 415068
The command device is \\.\PhysicalDrive2
On the Secondary SRM Server, open an elevated command prompt and navigate to the horcm\etc folder on D:
Run the following command to identify the arrays cmddev name and the serial number
raidscan -x findcmddev hdisk0,100
The primary array serial number is 415073
The command device is \\.\PhysicalDrive2
Add the details above to the HORCM100.conf on the Primary SRM Server and HORCM101.conf file on the Secondary SRM Server
At the top of the HORCM100.conf file we put in the serial number of the array as it makes it easier for us to liaise with Support and Storage if we have an issue, but it is not mandatory
In HORCM_MON we add the IP address of the Primary SRM server and the serial number of the Primary storage array
In HORCM_CMD, we put in the command device which is \\.\PhysicalDrive2
Note: A lot of info is already there but I will talk through these as we go.
At the top of the HORCM101.conf file we put in the serial number of the array as it makes it easier for us to liaise with Support and Storage if we have an issue, but it is not mandatory
In HORCM_MON we add the IP address of the Secondary SRM server and the serial number of the Primary storage array
In HORCM_CMD, we put in the command device which is \\.\PhysicalDrive2
Configure the opposite details for each site within the HORCM100.conf file on the Primary SRM server and the HORCM101.conf file on the Secondary SRM Server
Under the section HORCM_INST within the HORCM100.conf file, fill in the below details highlighted in yellow
Put in the IP address of the Secondary SRM server
Put in the name of the HORCM101.conf file on the Secondary SRM server
Under the section HORCM_INST within the HORCM101.conf file, fill in the below details highlighted in yellow
Put in the IP address of the Primary SRM server
Put in the name of the HORCM100.conf file on the Primary SRM server
Configure the HORCM100_run.txt on the Primary SRM Server and then HORCM101_run.txt file on the Secondary SRM Server
Navigate to D:\HORCM\Tool\HORCM100_run.txt
Set the below parameters highlighted in yellow below
Navigate to D:\HORCM\Tool\HORCM101_run.txt
Set the below parameters highlighted in yellow below
Run the following command from the tool folder on the Primary SRM Server and Secondary SRM Server
Run the following command from the tool folder on the Primary SRM Server and change the HORCM number to the one you are using
There is a very important note to add here, the –vl flag in the below commands tells the SAN to create the pairing based on the local HORCM instance that is referenced (100 in the case of the commands, as indicated by the –IH100 flag). What this means is that the local LDEV will become the Primary replication LDEV, with the LDEV in the other datacentre becoming the Secondary. So in this case because we have run the command from the PDC SRM server the replication will go from PDC > SDC, so the datastore in vCenter has to be created in PDC and will be replicated to SDC. With this in mind, it is vital that the pair creation commands are run from the correct SRM server, if the datastores are to be created in PDC then the pairs need to be created on the PDC SRM server. Otherwise the replication will be the wrong way around. After the pair create commands have been run, you can rerun the pair display commands to confirm the correct Primary and Secondary sites, this is discussed in more detail below.
Next Run a Pair display to make sure the LUNs are paired
The –g flag dictates
which group will be checked (same as DEV_GROUP from HORCM file).
The –IH flag dictates which HORCM instance to query. The –fxc flags dictate which info will be shown be the command.
The –fxc flags dictate which info will be shown be the
command.
Next steps – Log into vCenter and Site Recovery Manager
You will be on the Site pair page. You can also see the other 3 options
Click the issues to see if there are any problems
Next go to Array Based Replication and click on Storage Replication Adapters. Clcik both sites to make sure everything is ok
Click on Array Pairs and click Add
The Array pair wizard will open
For the name, enter Hitachi-ArrayManager
For the local protected HORCM site, enter HORCMINST=100 (100 is our HORCM instance on our protected site)
For the username and password, enter the credentials you have been given by your storage administrator.
In our case the username is horcm and then put in the password
For the remote recovery HORCM site, enter Hitachi-ArrayManager-Remote
For the remote recovery HORCM site, enter HORCMINST=101 (101 is our HORCM instance on our recovery site)
For the username and password, enter the credentials you have been given by your storage administrator.
In our case the username is horcm and then put in the password
The array pairs screen will then come up
Click Next and check the last screen and finish
You will now see the paired arrays
If you click on the Array pair, then below you will see the paired datastores
Next we will configure Network Mappings
Select the Recovery network
Check the Test networks. These are used instead of the recovery networks while running tests
Check
the Ready to Complete page and click Finish
Next, we will go through Folder Mappings
Choose Prepare Mappings manually
Select the mappings on both sides and click Add
The mappings will look similar to the below screen-print
Select the Reverse mappings
Click Finish after checking the Final screen
Next go to Resource Mapping
Select the Cluster Resource
Select the Reverse mappings
Check the Final Page and click finish
Placeholder Datastores
When you create an array-based replication protection group that contains datastore groups or a vSphere Replication protection group that contains individual virtual machines, Site Recovery Manager creates a placeholder virtual machine at the recovery site for each of the virtual machines in the protection group.
A placeholder virtual machine is a subset of virtual machine files. Site Recovery Manager uses that subset of files to register a virtual machine with vCenter Server on the recovery site.
The files of the placeholder virtual machines are very small, and do not represent full copies of the protected virtual machines. The placeholder virtual machine does not have any disks attached to it. The placeholder virtual machine reserves compute resources on the recovery site, and provides the location in the vCenter Server inventory to which the protected virtual machine recovers when you run recovery.
The presence of placeholder virtual machines on the recovery site inventory provides a visual indication to vCenter Server administrators that the virtual machines are protected by Site Recovery Manager. The placeholders also indicate to vCenter Server administrators that the virtual machines can power on and start consuming local resources when Site Recovery Manager runs tests or runs a recovery plan.
When you recover a protected virtual machine by testing or running a recovery plan, Site Recovery Manager replaces the placeholder with the recovered virtual machine and powers it on according to the settings of the recovery plan. After a recovery plan test finishes, Site Recovery Manager restores the placeholders and powers off the recovered virtual machines as part of the cleanup process.
Go to Site Recovery Manager
> Configure > Placeholder Datastores and click +New
Choose the datastore you created to be the Placeholder Datastore
You will then see the Placeholder Datastore added in SRM
Select the Placeholder Datastore
You will now see your Recovery Placeholder Datastore under the Recovery vCenter
Next we need to create a Protection Group
In SRM, protection groups are a way of grouping VMs that will be recovered together. A protection group contains VMs whose data has been replicated by either array-based replication (ABR) or vSphere replication (VR). A protection group cannot contain VMs replicated by more than one replication solution (eg. same VM protected by both vSphere replication and array-based replication) and, a VM can only belong to a single protection group.
How do Protection Groups fit into SRM?
Recovery Plans in SRM are like an automated run book, controlling all the steps in the recovery process. The recovery plan is the level at which actions like Failover, Planned Migration, Testing and Reprotect are conducted. A recovery plan contains one or more protection groups and a protection group can be included in more than one recovery plan. This provides for the flexibility to test or recover the email application by itself and also test or recover a group of applications or the entire site. Thanks to Kato Grace for this information and diagram below
Click New in the Protection Group screen
Fill in the necessary details and make sure you select the right direction
Select the type of replication (In this case we are using Datastore groups (array-based replication)
Click Next and choose the Datastore(s) you want to add to the Protection Group
Select whether you want to add the Protection Group to a Recovery Plan. For now I will say Do not add as we will go through a Recovery Plan next
Check the Ready to Complete screen and make sure everything is as expected. Click Finish.
You will then be back to the Protection Group page which looks like the following
If you click on the Protection Group, you will see all the details. Check any issues and have a look through the tabs to check everything looks as expected.
Next we will set up a Recovery Plan. Click on the Recovery Plan tab and click New
Put in a Name, Description, Direction and Location
Choose your Protection Group(s)
Leave everything as it is in the Test networks Screen
Click Next and on the Ready to Complete screen, check the details and click Finish
Click on the Recovery Plan tab and then on your previously created Recovery Plan
Before Installing Solarwinds TFTP Server, it will prompt you if you haven’t to install .NET Framework 3.5. You may already have it enabled or the installer will try and locate and install it for you. The other option is to install it via the Roles and Features option on the Windows Server. For reference I am using a Windows 2012 R2 server.
Right click on the installer and Run as Administrator
Accept the License Agreement and click Next
Click Install
Click Finish
Open the SolarWinds TFTP Server
Click File > Configure
The below screen will come up. Make a note of the TFTP server root directory.
Other screens look like the below. Server Bindings
Security
Language Screen
You may need to modify the Windows firewall with a rule to allow inbound traffic port 69 UDP for TFTP.
You now need to download the TFTP Boot Zip file and unzip it into your TFTP folder which here is c:\TFTP-Root
vSphere Auto Deploy lets you provision hundreds of physical hosts with ESXi software.
Using Auto Deploy, administrators can manage large deployments efficiently. Hosts are network-booted from a central Auto Deploy server. Optionally, hosts are configured with a host profile of a reference host. The host profile can be set up to prompt the user for input. After boot up and configuration complete, the hosts are managed by vCenter Server just like other ESXi hosts.
Types of AutoDeploy Install
Auto Deploy can also be used for Stateless caching or Stateful install. There are several more options than there were before which are shown below in a screen-print from a host profile.
What is stateless caching?
Stateless caching addresses this by caching the ESXi image on the host’s local storage. If AutoDeploy is unavailable then the host will boot from its local cached image. There are a few things that need to be in place before stateless caching can be enabled:
Hosts should be set to boot from network first, and local disk second
Ensure that there is a disk with at least 1 GB available
The host should be set up to get it’s settings from a Host Profile as part of the AutoDeploy rule set.
To configure a host to use stateless caching, the host profile that it will receive needs to be updated with the relevant settings. To do so, edit the host profile, and navigate to the ‘System Image Cache Profile Settings’ section, and change the drop-down menu to ‘Enable stateless caching on the host’
Stateless caching can be seen in the below diagram
What is Stateful Caching?
It is also possible to have AutoDeploy install ESXi. When the host first boots it will pull the image from the AutoDeploy server, then on all subsequent restarts the host will boot from the locally installed image, just as with a manually built host. With stateful installs, ensure that the host is set to boot from disk firstly, followed by network boot.
AutoDeploy stateful installs are configured in the same way as stateless caching. Edit the host profile, this time changing the option to ‘Enable stateful installs on the host’:
AutoDeploy Architecture
Pre-requisites
A vSphere Auto Deploy infrastructure will contain the below components
vSphere vCenter Server – vSphere 6.7U1 is the best and most comprehensive option to date.
A DHCP server to assign IP addresses and TFTP details to hosts on boot up – Windows Server DHCP will do just fine.
A TFTP server to serve the iPXE boot loader
An ESXi offline bundle image – Download from my.vmware.com.
A host profile to configure and customize provisioned hosts – Use the vSphere Web Client.
ESXi hosts with PXE enabled network cards
1.VMware AutoDeploy Server
Serves images and host profiles to ESXi hosts.
vSphere Auto Deploy rules engine
Sends information to the vSphere Auto Deploy server which image profile and which host profile to serve to which host. Administrators use vSphere Auto Deploy to define the rules that assign image profiles and host profiles to host
2. Image Profile Server
Define the set of VIBs to boot ESXi hosts with.
VMware and VMware partners make image profiles and VIBs available in public depots. Use vSphere ESXi Image Builder to examine the depot and use the vSphere Auto Deploy rules engine to specify which image profile to assign to which host.
VMware customers can create a custom image profile based on the public image profiles and VIBs in the depot and apply that image profile to the host
3. Host Profiles
Define machine-specific configuration such as networking or storage setup. Use the host profile UI to create host profiles. You can create a host profile for a reference host and apply that host profile to other hosts in your environment for a consistent configuration
4. Host customization
Stores information that the user provides when host profiles are applied to the host. Host customization might contain an IP address or other information that the user supplied for that host. For more information about host customizations, see the vSphere Host Profiles documentation.
Host customization was called answer file in earlier releases of vSphere Auto Deploy
5. Rules and Rule Sets
Rules
Rules can assign image profiles and host profiles to a set of hosts, or specify the location (folder or cluster) of a host on the target vCenter Server system. A rule can identify target hosts by boot MAC address, SMBIOS information, BIOS UUID, Vendor, Model, or fixed DHCP IP address. In most cases, rules apply to multiple hosts. You create rules by using the vSphere Client or vSphere Auto Deploy cmdlets in a PowerCLI session. After you create a rule, you must add it to a rule set. Only two rule sets, the active rule set and the working rule set, are supported. A rule can belong to both sets, the default, or only to the working rule set. After you add a rule to a rule set, you can no longer change the rule. Instead, you copy the rule and replace items or patterns in the copy. If you are managing vSphere Auto Deploy with the vSphere Client, you can edit a rule if it is in inactive state
You can specify the following parameters in a rule.
Active Rule Set
When a newly started host contacts the vSphere Auto Deploy server with a request for an image profile, the vSphere Auto Deploy server checks the active rule set for matching rules. The image profile, host profile, vCenter Server inventory location, and script object that are mapped by matching rules are then used to boot the host. If more than one item of the same type is mapped by the rules, the vSphere Auto Deploy server uses the item that is first in the rule set.
Working Rule Set
The working rule set allows you to test changes to rules before making the changes active. For example, you can use vSphere Auto Deploy cmdlets for testing compliance with the working rule set. The test verifies that hosts managed by a vCenter Server system are following the rules in the working rule set. By default, cmdlets add the rule to the working rule set and activate the rules. Use the NoActivate parameter to add a rule only to the working rule set.
You use the following workflow with rules and rule sets.
Make changes to the working rule set.
Test the working rule set rules against a host to make sure that everything is working correctly.
Refine and retest the rules in the working rule set.
Activate the rules in the working rule set.If you add a rule in a PowerCLI session and do not specify the NoActivate parameter, all rules that are currently in the working rule set are activated. You cannot activate individual rules
AutoDeploy Boot Process
The boot process is different for hosts that have not yet been provisioned with vSphere Auto Deploy (first boot) and for hosts that have been provisioned with vSphere Auto Deploy and added to a vCenter Server system (subsequent boot).
First Boot Prerequisites
Before a first boot process, you must set up your system. .
Set up a DHCP server that assigns an IP address to each host upon startup and that points the host to the TFTP server to download the iPXE boot loader from.
If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.
Identify an image profile to be used in one of the following ways.
Choose an ESXi image profile in a public depot.
Create a custom image profile by using vSphere ESXi Image Builder, and place the image profile in a depot that the vSphere Auto Deploy server can access. The image profile must include a base ESXi VIB.
If you have a reference host in your environment, export the host profile of the reference host and define a rule that applies the host profile to one or more hosts.
Specify rules for the deployment of the host and add the rules to the active rule set.
First Boot Overview
When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.
When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.
When the administrator turns on a host, the host starts a PXE boot sequence.The DHCP Server assigns an IP address to the host and instructs the host to contact the TFTP server.
The host contacts the TFTP server and downloads the iPXE file (executable boot loader) and an iPXE configuration file.
iPXE starts executing.The configuration file instructs the host to make a HTTP boot request to the vSphere Auto Deploy server. The HTTP request includes hardware and network information.
In response, the vSphere Auto Deploy server performs these tasks:
Queries the rules engine for information about the host.
Streams the components specified in the image profile, the optional host profile, and optional vCenter Server location information.
The host boots using the image profile.If the vSphere Auto Deploy server provided a host profile, the host profile is applied to the host.
vSphere Auto Deploy adds the host to thevCenter Server system that vSphere Auto Deploy is registered with.
If a rule specifies a target folder or cluster on the vCenter Server system, the host is placed in that folder or cluster. The target folder must be under a data center.
If no rule exists that specifies a vCenter Server inventory location, vSphere Auto Deploy adds the host to the first datacenter displayed in the vSphere Client UI.
If the host profile requires the user to specify certain information, such as a static IP address, the host is placed in maintenance mode when the host is added to the vCenter Server system.You must reapply the host profile and update the host customization to have the host exit maintenance mode. When you update the host customization, answer any questions when prompted.
If the host is part of a DRS cluster, virtual machines from other hosts might be migrated to the host after the host has successfully been added to the vCenter Server system.
Subsequent Boots Without Updates
For hosts that are provisioned with vSphere Auto Deploy and managed by avCenter Server system, subsequent boots can become completely automatic.
The administrator reboots the host.
As the host boots up, vSphere Auto Deploy provisions the host with its image profile and host profile.
Virtual machines are brought up or migrated to the host based on the settings of the host.
Standalone host. Virtual machines are powered on according to autostart rules defined on the host.
DRS cluster host. Virtual machines that were successfully migrated to other hosts stay there. Virtual machines for which no host had enough resources are registered to the rebooted host.
If the vCenter Server system is unavailable, the host contacts the vSphere Auto Deploy server and is provisioned with an image profile. The host continues to contact the vSphere Auto Deploy server until vSphere Auto Deploy reconnects to the vCenter Server system.
vSphere Auto Deploy cannot set up vSphere distributed switches if vCenter Server is unavailable, and virtual machines are assigned to hosts only if they participate in an HA cluster. Until the host is reconnected to vCenter Server and the host profile is applied, the switch cannot be created. Because the host is in maintenance mode, virtual machines cannot start.
Important: Any hosts that are set up to require user input are placed in maintenance mode
Subsequent Boots With Updates
You can change the image profile, host profile, vCenter Server location, or script bundle for hosts. The process includes changing rules and testing and repairing the host’s rule compliance.
The administrator uses the Copy-DeployRule PowerCLI cmdlet to copy and edit one or more rules and updates the rule set. .
The administrator runs the Test-DeployRulesetCompliance cmdlet to check whether each host is using the information that the current rule set specifies.
The host returns a PowerCLI object that encapsulates compliance information.
The administrator runs the Repair-DeployRulesetCompliance cmdlet to update the image profile, host profile, or vCenter Server location the vCenter Server system stores for each host.
When the host reboots, it uses the updated image profile, host profile, vCenter Server location, or script bundle for the host.If the host profile is set up to request user input, the host is placed in maintenance mode
Note: Do not change the boot configuration parameters to avoid problems with your distributed switch
Prepare your system for AutoDeploy
Before you can PXE boot an ESXi host with vSphere Auto Deploy, you must install prerequisite software and set up the DHCP and TFTP servers that vSphere Auto Deploy interacts with.
Prerequisites
Verify that the hosts that you plan to provision with vSphere Auto Deploy meet the hardware requirements for ESXi. See ESXi Hardware Requirements.
Verify that the ESXi hosts have network connectivity to vCenter Server and that all port requirements are met. See vCenter Server Upgrade.
Verify that you have a TFTP server and a DHCP server in your environment to send files and assign network addresses to the ESXi hosts that Auto Deploy provisions.
Verify that the ESXi hosts have network connectivity to DHCP, TFTP, and vSphere Auto Deploy servers.
If you want to use VLANs in your vSphere Auto Deploy environment, you must set up the end to end networking properly. When the host is PXE booting, the firmware driver must be set up to tag the frames with proper VLAN IDs. You must do this set up manually by making the correct changes in the UEFI/BIOS interface. You must also correctly configure the ESXi port groups with the correct VLAN IDs. Ask your network administrator how VLAN IDs are used in your environment.
Verify that you have enough storage for the vSphere Auto Deploy repository. The vSphere Auto Deploy server uses the repository to store data it needs, including the rules and rule sets you create and the VIBs and image profiles that you specify in your rules.Best practice is to allocate 2 GB to have enough room for four image profiles and some extra space. Each image profile requires approximately 350 MB. Determine how much space to reserve for the vSphere Auto Deploy repository by considering how many image profiles you expect to use.
Obtain administrative privileges to the DHCP server that manages the network segment you want to boot from. You can use a DHCP server already in your environment, or install a DHCP server. For your vSphere Auto Deploy setup, replace the gpxelinux.0 filename with snponly64.efi.vmw-hardwired for UEFI or undionly.kpxe.vmw-hardwired for BIOS.
Secure your network as you would for any other PXE-based deployment method. vSphere Auto Deploy transfers data over SSL to prevent casual interference and snooping. However, the authenticity of the client or the vSphere Auto Deploy server is not checked during a PXE boot.
If you want to manage vSphere Auto Deploy with PowerCLI cmdlets, verify that Microsoft .NET Framework 4.5 or 4.5.x and Windows PowerShell 3.0 or 4.0 are installed on a Windows machine. You can install PowerCLI on the Windows system on which vCenter Server is installed or on a different Windows system. See the vSphere PowerCLI User’s Guide.
Set up a remote Syslog server. See the vCenter Server and Host Management documentation for Syslog server configuration information. Configure the first host you boot to use the remote Syslog server and apply that host’s host profile to all other target hosts. Optionally, install and use the vSphere Syslog Collector, a vCenter Server support tool that provides a unified architecture for system logging and enables network logging and combining of logs from multiple hosts.
Install ESXi Dump Collector, set up your first host so that all core dumps are directed to ESXi Dump Collector, and apply the host profile from that host to all other hosts.
If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.
Starting to configure AutoDeploy
Step 1 – Enable the AutoDeploy, Image Builder Service and Dump Collector Service
Install vCenter Server or deploy the vCenter Server Appliance.The vSphere Auto Deploy server is included with the management node.
Configure the vSphere Auto Deploy service startup type.
On the vSphere Web Client Home page, click Administration.
Under System Configuration, click Services
Select Auto Deploy, click the Actions menu, and select Edit Startup Type and select Automatic
(Optional) If you want to manage vSphere Auto Deploy with the vSphere Web Client, configure the vSphere ESXi Image Builder service startup type
Check the Startup
Log out of the vSphere Web Client and log in again.The Auto Deploy icon is visible on the Home page of the vSphere Web Client
Enable the Dump Collector
You can now either set the dump collector manually on each host or configure the host profile with the settings
If you want to enter it manually and point the dump collector to the vCenter then the following commands are used
esxcli system coredump network set –interface-name vmk0 –server-ipv4 10.242.217.11 –server-port 6500
esxcli system coredump network set –enable true
Enable Automatic Startup
Step 2 Configure the TFTP server
There are different options here. Some people use Solarwinds or there is the option now to use an inbuilt TFTP service on the vCenter
Important: The TFTP service in vCenter is only supported for dev and test environments, not production and will be coming out of future releases of vCenter. It is best to have a separate TFTP server.
Instructions
Now that Auto Deploy is enabled we can configure the TFTP server. Enable SSH on the VCSA by browsing to the Appliance Management page: https://VCSA:5480 where VCSA is the IP or FQDN of your appliance.
Log in as the root account. From the Access page enable SSH Login and Bash Shell.
SSH onto the vCenter Appliance, using a client such as Putty, and log in with the root account. First type shell and hit enter to launch Bash.
To start the TFTP service enter service atftpd start
Check the service is started using service atftpd status
To allow TFTP traffic through the firewall on port 69; we must run the following command. (Note double dashes in front of dport)
Validate traffic is being accepted over port 69 using the following command
iptables -nL | grep 69
iptables can be found in /etc/systemd/scripts just for reference
Validate traffic is being accepted over port 69 using iptables -nL | grep 69
Type chkconfig atftpd on
To make the iptables rules persistent is to load them after a reboot from a script.
Save the current active rules to a file
iptables-save > /etc/iptables.rules
Next create the below script and call it starttftp.sh
#! /bin/sh # # TFTP Start/Stop the TFTP service and allow port 69 # # chkconfig: 345 80 05 # description: atftpd ### BEGIN INIT INFO # Provides: atftpd # Required-Start: $local_fs $remote_fs $network # Required-Stop: # Default-Start: 3 5 # Default-Stop: 0 1 2 6 # Description: TFTP ### END INIT INFO service atftpd start iptables-restore -c < /etc/iptables.rules
Put the starttftp.sh script in /etc/init.d via WinSCP
Put full permissions on the script
This should execute the command and reload the firewall tables after the system is rebooted
Reboot the vCenter appliance to test the script is running. If successful the atftpd service will be started and port 69 allowed, you can check these with service atftpd status and iptables -nL | grep 69.
Your TFTP directory is located at /tftpboot/
The TRAMP file on the vCenter must also now be modified and the DNS name removed and replaced with the IP address of the vCenter. Auto Deploy will not work without doing this part
The directory already contains the necessary files for Auto Deploy (tramp file, undionly.kpxe.vmw-hardwired, etc) Normally if you use Solarwinds TFTP server, you would need to download the TFTP Boot Zip and extract the files into the TFTP Root folder
Note there may be an issue with downloading this file due to security restrictions being enabled by some of the well known browsers – This is the likely message seen below. You may have to modify a browser setting in order to access the file
If everything is ok then you’ll be able to download it but note again, you do not need to download this if you are using the inbuilt TFTP server in vCenter as the files are already there.
Step 3 – Setting up DHCP options
The DHCP server assigns an IP address to the ESXi host when the host boots. The DHCP server also provides two required options to point the host to the TFTP server and to the boot files necessary for vSphere Auto Deploy to work. These additional DHCP options are as follows:
066 – Boot Server Host Name – This option must
be enabled, and the IP address of the server running TFTP should be inserted
into the data entry field.
067 – Bootfile Name –The “BIOS DHCP File Name”
found in the vSphere Auto
Deploy settings of your vCenter Server must be used here. The file name is undionly.kpxe.vmw-hardwired.
Go to Server Options and click Configure Options
In the value for option 066 (next-server) enter the FQDN of the TFTP boot server. In my case my vCenter Server hosting the TFTP service
Select option 67 and type in undionly.kpxe.vmw-hardwired.The undionly.kpxe.vmw-hardwired iPXE binary will be used to boot the ESXi host
Note: if you were using UEFI, you would need to put snponly64.efi.vmw-hardwired
You should now see the two options in DHCP
Next we need to add a scope and reservations to this scope
Right click IPv4 and select New Scope
A wizard will pop up
Put in a name and description
Put in the network IP range and subnet mask for the scope. Note: I have 3 hosts for testing.
Ignore the next screen and click Next
Ignore the next screen and click Next
Click No to configure options afterwards
Click Finish
We now need to create a DHCP reservation for each target ESXi host
In the DHCP window, navigate to DHCP > hostname > IPv4 > Autodeploy Scope > Reservations.
Right-click Reservations and select New Reservation.
In the New Reservation window, specify a name, IP address, and the MAC address for one of the hosts. Do not include the colon (:) in the MAC address.
The initial installation and setup is now finished and we can now start with the next stage
Stage 4 Image Builder and AutoDeploy GUI
The next stage involves logging into myvmware.com and downloading an offline bundle of the version of ESXi you need
Go to Home > Autodeploy in vCenter and select Add a Software Depot
Click Software Depots and then click Import Software Depot and upload. 4 images are normally recommended space wise.
Once uploaded, click on the depot and you should see the below
And
If you click on an image, you get options above where you can clone or export to an iso for example
Stage 5 – Creating an Deploy Rule
A deploy rule gives you control over the deployment process since you can specify which image profile is rolled out and on which server. Once a rule is created, you can also Edit or Clone it. Once created, the rule has to be activated for it to apply. If rules are not activated, Auto Deploy will fail
Click on the Deploy Rules tab and add a name
Next we want to select hosts that match the following pattern. There are multiple options
Asset
Domain
Gateway IPv4
Hostname
IPv4
IPv6
MAC address
Model
OEM string
Serial number
UUID
Vendor
I am going to use an IP range of my 3 hosts which is 192.168.1.100-192.168.1.102
Next Select an Image Profile
Select the ESXi image to deploy to the hosts, change the software depot from the drop down menu if needed, then click Next. If you have any issues with vib signatures you can skip the signature checks using the tick box.
The main setting which must be chosen when creating a host profile to be used with AutoDeploy is how you want AutoDeploy to install an image as per below
Host Profile selection screen
Next Select a location
Next you will be on the Ready to Complete screen. Check the details and click Finish if you are happy
Note: The rule will be inactive – To use it, you will need to activate it but we will cover this in the next steps
The deploy rule is created but in an inactive state. Select the deploy rule and note the options; Activate / Deactivate, Clone, Edit, Delete. Click Activate / Deactivate, a new window will open. Select the newly created deploy rule and click Activate, Next, and Finish.
Now the deploy rule is activated; when you boot a host where the deploy rule is applicable you will see it load ESXi and the customization specified in the host profile. Deploy rules need to be deactivated before they can be edited.
You can setup multiple deploy rules using different images for different clusters or host variables. Hosts using an Auto Deploy ruleset are listed in the Deployed Hosts tab, hosts that didn’t match any deploy rules are listed under Discovered Hosts
Stage 6 – Reboot the ESXi host and see if the AutoDeploy deployment works as expected.
When you reboot a host, it will then come up as per the below screenprint
Once booted up, remediate the host
If you type in the following URL – https://<vCenter IP>:6502/vmw/rbd, it should take you to the Auto Deploy Debugging page where you can view registered hosts along with a detailed view of host and PXE information as well as the Auto Deploy Cache content
What do you do when you need to modify the Image Profile or Host Profile?
There are 2 commands you need to run to ensure the hosts can pick up the new data from the AutoDeploy rule whether it be a new image or a new/modified host profile. If you don’t run these, you will likely find that when you reboot your vSphere hosts they still boot from the old image”.
This situation occurs when you update the active ruleset without updating the corresponding host entries in the auto deploy cache. The first time a host boots the Auto Deploy server parses the host attributes against the active ruleset to determine (a) The image profile, (b) The host profile, and (c) the location of the host in the vCenter inventory. This information then gets saved in the auto deploy cache and reused on all future reboots. The strategy behind saving this information is to reduce the load on the auto deploy server by eliminating the need to parse each host against the rules engine on every reboot. With this approach each host only gets parsed against the active ruleset once (on the initial boot) after which the results get saved and reused on all subsequent reboots.
However, anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location. When you make changes not only do you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts. This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet.
Use the “Test-DeployRuleSetCompliance” cmdlet to check if the host information saved on the auto deploy server is up-to-date. This cmdlet parses the host attributes against the active ruleset and compares the results with the information saved in the cache. If the saved information is incorrect (i.e. out of compliance) the cmdlet will return a status of “Non-Compliant” and show what needs to be updated. If the information in the cache is correct, then the command will simply return an empty string.
In order to check one host, we can use Test-DeployRuleSetCompliance lg-spsp-cex03.lseg.stockex.local. it will tell us it is non-compliant
In order to repair a single host to do a test we can use the below piped command. If you get an empty string back then the cache is not correct and ready to use the new image
However, if we want to be clever about this because we have a lot of hosts, then we can run a quick simple PowerCLI “foreach” loop so we don’t have to update one host at a time
foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}
At this point, I would now start the TFTP service on the vCenter. Note: If you are using Solarwinds, this not necessary! Unless you want to double check it is all ok first.
Next Reboot the hosts and check they come up as the right version, example of our environment below pre and post remediation
Other issues we faced!
Issue 1 – TFTP Service on the vCenter
We used the TFTP service which was inbuilt to the vCenter. What you will find if you use this is that it will start but then it will automatically stop itself after a while which is fine. It’s just a case of remembering to start it. I found that with our HPE hosts, even after modifying the AutoDeploy rule and running the TestDeploy and RepairDeploy rules, that it was still booting from cache. In the ILO screen, you could see it picking up a DHCP address and the DHCP service passing the TFTP server to the host but then it timed out, Once the service was started on the vCenter it was fine.
service atftpd start
service atftp status
Note: Apparently VMware do not support the inbuilt vCenter Service so when we asked how we could keep the service running, we were told they wouldn’t help with it. So probably best to install something like Solarwinds which will keep the service running continuously.
Issue 2 – HPE Oneview Setting for PXE Boot
We found that with HPE BL460 Blades with SSD cards in, sometimes an empty host would boot up and lock a partition. This resulted in the host profile not being able to be applied, settings all over the place and there was absolutely no way of getting round it. We could only resolve it by using gparted to wipe the disk and boot again. There seemed to be no logic though as 5 out of 10 fresh hosts would boot up fine and 5 would not and lock the partition.
This what you would see if you hover over the error in vCenter
As virtual infrastructures grow, it can become increasingly difficult and time consuming to configure multiple hosts in similar ways. Existing per-host processes typically involve repetitive and error-prone configuration steps. As a result, maintaining configuration consistency and correctness across the datacenter requires increasing amounts of time and expertise, leading to increased operational costs. Host Profiles eliminates per-host, manual or UI-based host configuration and maintains configuration consistency and correctness across the datacenter by using Host Profiles policies. These policies capture the blueprint of a known, validated reference host configuration, including the networking, storage, security and other settings.
You can then use this profile to:
• Automate host configuration across a large number of hosts and clusters. You can use Host Profiles to simplify the host provisioning process, configure multiple hosts in a similar way, and reduce the time spent on configuring and deploying new VMware ESX/ESXi hosts.
• Monitor for host configuration errors and deviations. You can use Host Profiles to monitor for host configuration changes, detect errors in host configuration, and ensure that the hosts are brought back into a compliant state. With Host Profiles, the time required to set up, change, audit and troubleshoot configurations drops dramatically due to centralized configuration and compliance checking. Not only does it reduce labor costs, but it also minimizes risk of downtime for applications/ virtual machines provisioned to misconfigured systems.
Accessing Host Profiles
Click Home > Host Profiles
You should see the below
What can we do with Host Profiles?
Create a Host Profile
Edit a Host Profile
Extract a Host Profile from a host
Attach a Host Profile to a host or cluster
Check compliance
Remediate a host
Duplicate a Host Profile
Copy settings from a host – If the configuration of the reference host changes, you can update the Host Profile so that it matches the reference host’s new configuration
Import a Host Profile – .vpf
Export a Host Profile – .vpf
Steps to create a profile
Host Profiles automates host configuration and ensures compliance in four steps: 1.
Step 1: Create a profile, using the designated reference host. To create a host profile, VMware vCenter Server retrieves and encapsulates the configuration settings of an existing VMware ESX/ESXi host into a description that can be used as a template for configuring other hosts. These settings are stored in the VMware vCenter Server database and can be exported into the VMware profile format (.vpf).
Step 2: Attach a profile to a host or cluster. After you create a host profile, you can attach it to a particular host or cluster. This enables you to compare the configuration of a host against the appropriate host profile.
Step 3: Check the host’s compliance against a profile. Once a host profile is created and attached with a set of hosts or clusters, VMware vCenter Server monitors the configuration settings of the attached entities and detects any deviations from the specified “golden” configuration encapsulated by the host profile.
Step 4: Apply the host profile of the reference host to other hosts or clusters of hosts. If there is a deviation, VMware vCenter Server determines the configuration that applies to a host. To bring noncompliant hosts back to the desired state, the VMware vCenter Server Agent applies a host profile by passing host configuration change commands to the VMware ESX/ESXi host agent through the vSphere API
Steps to create a host profile
In the Host Profiles view, click Extract Profile from a host
2. You should get a wizard pop up. Choose the vCenter followed by the host you want to extract the profile from
3. Put in a name and description
4. Ready to Complete
5. A Host profile will be created and appear in the Host Profiles section
6. Edit the settings of the Host Profile by right clicking on the profile and click Edit Settings
7. The Edit Host Profile screen will pop up
8. Click Next to get to the Settings screen
9. When you edit the Host profile you can expand the Host profiles configuration hierarchy to see the sub profile components that comprise the Host profile. These components are categorised by functional group or resource class to make it easier to find a particular parameter. Each subprofile component contains one or more attributes and parameters, along with the policies and compliance checks
10. You can also mark settings as favourites by clicking the yellow star. you can then click View > Favourites to simplify searching for settings.
11. For example we have a default shared Datastore for storing logs under their own unique name. This saves us time configuring it manually
12. Note: There is an important setting if you are using a host profile with AutoDeploy. It will dictate how ESXi is installed and how the install will work on future reboots. vSphere has introduced new options described below for deploying hosts. I will be doing a further blog about AutoDeploy using these settings
Stateless Caching
Upon provisioning, the ESXi image is written or cached to a host’s server local (internal) or USB disk. The option is particularly useful when multiple ESXi hosts are being provisioned concurrently so rather than saturate the network, ESXi is re-provisioned from a cached image from a local or USB disk. Problems can occur such as the below though.
a) If the vCenter Server is available but the vSphere Auto Deploy server is unavailable, hosts do not connect to the vCenter Server system automatically. You can manually connect the hosts to the vCenter Server, or wait until the vSphere Auto Deploy server is available again.
b) If both vCenter Server and vSphere Auto Deploy are unavailable, you can connect to each ESXi host by using the VMware Host Client, and add virtual machines to each host.
c) If vCenter Server is not available, vSphere DRS does not work. The vSphere Auto Deploy server cannot add hosts to the vCenter Server. You can connect to each ESXi host by using the VMware Host Client, and add virtual machines to each host.
d) If you make changes to your setup while connectivity is lost, the changes are lost when the connection to the vSphere Auto Deploy server is restored.
Stateful Install
When the host first boots it will pull the image from the AutoDeploy server, then on all subsequent restarts the host will boot from the locally installed image, just as with a manually built host. With stateful installs, ensure that the host is set to boot from disk firstly, followed by network boot.
13. Once we have finished customising our profile, we can save it then we need to attach it to our hosts
14. Click the Attach/Detach Hosts and Clusters button within Host Profiles. A wizard will appear. I’m just going to test one of my hosts first and select attach. Keep Skip Host Customization unticked as we will see where we get any missing information which needs entering.
15. You will likely get some host customization errors as I did where I needed to fill in a DNS name of my host and add a username and password to join the hosts to the domain.
16. Next click on the button to check host compliance
17. I can see that one of my hosts is not compliant so I will see what I need to adjust
18. So I double check all my settings and find that yes, there is a mismatch in the config for esxupdate in the firewall config and there are different values between hosts for syslog settings. I’ll check and adjust these and run the Check Host Compliance again.
19. Lo and behold, I now have 3 compliant hosts 🙂
Reference Host setup for Autodeploy
A well-designed reference host connects to all services such as syslog, NTP, and so on. The reference host setup might also include security, storage, networking, and ESXi Dump Collector. You can apply such a host’s setup to other hosts by using host profiles.
The exact setup of your reference host depends on your environment, but you might consider the following customization.
NTP Server Setup
When you collect logging information in large environments, you must make sure that log times are coordinated. Set up the reference host to use the NTP server in your environment that all hosts can share. You can specify an NTP server by running the vicfg-ntp command. You can start and stop the NTP service for a host with the vicfg-ntp command, or the vSphere Web Client.
Edit the Host profile with the settings for your NTP service
Syslog Server Setup
All ESXi hosts run a syslog service (vmsyslogd), which logs messages from the VMkernel and other system components to a file. You can specify the log host and manage the log location, rotation, size, and other attributes by running the esxcli system syslog vCLI command or by using the vSphere Web Client. Setting up logging on a remote host is especially important for hosts provisioned with vSphere Auto Deploy that have no local storage. You can optionally install the vSphere Syslog Collector to collect logs from all hosts.
Edit the Host profile with the below 2 settings
Core Dump Setup
You can set up your reference host to send core dumps to a shared SAN LUN, or you can enable ESXi Dump Collector in the vCenter appliance and configure the reference host to use ESXi Dump Collector. After setup is complete, VMkernel memory is sent to the specified network server when the system encounters a critical failure.
Turn on the Dump Collector service in vCenter
Configure the host profile to enable and point the host to the vCenter on port 6500
Security Setup
In most deployments, all hosts that you provision with vSphere Auto Deploy must have the same security settings. You can, for example, set up the firewall to allow certain services to access the ESXi system, set up the security configuration, user configuration, and user group configuration for the reference host with the vSphere Web Client or with vCLI commands. Security setup includes shared user access settings for all hosts. You can achieve unified user access by setting up your reference host to use Active Directory. See the vSphere Securitydocumentation.
So this is an interesting one because I don’t have a solution but it is now working so I can only explain what happened. The 3 blades hosting the vCenter were HPE Proliant BL460c Gen10 servers. Once I reached the end of configuring a 6.5U2c vCenter for vCenter HA, I received the following error message.
So after going back and double checking typos and distributed switch and port group settings, everything looked fine but this error as you can see specifically mentioned a host vmh01. So i decided to run the VCHA wizard again which produced the same error but listed the second host
As I had 3 hosts in the cluster, I decided to run the wizard a third time which errored on the third host but running it a fourth time meant the VCHA setup ran perfectly and finished without any problems. There was no problem with the vDS or port groups or general networking.
The great thing about VCHA is that in this instance, it rolls everything back so you can simply start again. You might ask why I havent taken a snapshot – well it doesn’t allow you to do this! The rollback works very well, in fact 3 times in this scenario. Obviously not so good if you have hundreds of hosts 😀 A very strange problem where the NICs seemed to need a push before deciding to work however it did work in the end.
This is a short post because I came across a really useful link to a PDF document showing how vCenter and ESXi are connected. This shows what ports and what direction the ports travel which is really useful to understand the internal and external communication with explanations for each port.
The vCenter High Availability architecture uses a three-node cluster to provide availability against multiple types of hardware and software failures. A vCenter HA cluster consists of one Active node that serves client requests, one Passive node to take the role of Active node in the event of failure, and one quorum node called the Witness node. Any Active and Passive node-based architecture that supports automatic failover relies on a quorum or a tie-breaking entity to solve the classic split-brain problem, which refers to data/availability
inconsistencies due to network failures within distributed systems maintaining replicated data. Traditional architectures use some form of shared storage to solve the split-brain problem. However, in order to support a vCenter HA cluster spanning multiple datacenters, our design does not assume a shared storage–based deployment. As a result, one node in the vCenter HA cluster is permanently designated as a quorum node, or a Witness node. The other two nodes in the cluster dynamically assume the roles of Active and Passive nodes.
vCenter Server availability is assured as long as there are two nodes running inside a cluster. However, a cluster is considered to be running in a degraded state if there are only two nodes in it. A subsequent failure in a degraded cluster means vCenter services are no longer available.
A vCenter Server appliance is stateful and requires a strong, consistent state for it to work correctly. The appliance state (configuration state or runtime state) is mainly composed of:
• Database data (stored in the embedded PostgreSQL database)
• Flat files (for example, configuration files).
The appliance state must be backed up in order for VCHA failover to work properly. For the state to be stored inside the PostgreSQL database, we use the PostgreSQL native replication mechanism to keep the database data of the primary and secondary in sync. For flat files, a Linux native solution, rsync, is used for replication.
Because the vCenter Server appliance requires strong consistency, it is a strong requirement to utilize a synchronous form of replication to replicate the appliance state from the Active node to the Passive node
Installing vCenter HA
Download the relevant vCenter HA iso from the VMware download page
Mount the iso from a workstation or server
We’ll now go through the process of installing the first vCenter Server. I have mounted the iso on my Windows 10 machine
Go to vcsa-ui-installer > win32 > installer.exe
Click Install
Click Next
Click Accept License Agreement
Select Embedded Platform Services Controller. Note you can deploy an external PSC. I am doing the install this way as I want to test the embedded linked mode functionality now available in 6.5U2+ between embedded platform services controllers (This will require the build of another vCenter HA with an embedded PSC which I’ll try and cover in an another blog)
Next put in the details for a vCenter or host as the deployment target
Select the Certificate
Put in an appliance, username and password for the new vCenter appliance
Choose the deployment size and the Storage Size. Click Next
Choose the datastore to locate the vCenter on. Note: I am running vSAN.
Configure network settings. Note: As I chose a host to deploy to, it does not give me any existing vDS port groups. I have chosen to deploy to a host rather than an existing vCenter as I am testing this for a Greenfield build at work which does not have any existing vCenters etc to start with, just hosts.
Note: It would be useful at the point to make sure you have entered the new vCenter name and IP address into DNS.
Check all the details are correct
Click Finish. It should now say Initializing and start deploying
You should see the appliance is being deployed.
When the deployment has finished, you should see this screen.
You can carry on with Step 2 at this point but I closed the wizard at this point and I’m now going to log in to my vCenter and configure the appliance settings on https://techlabvca002.techlab.local:5480
Click Set up vCenter Server Appliance
Log in to the vCenter
The below screen will pop up. Click Next
Check all details
Put in time servers. I’m connected to the internet through my environment so I use some generic time servers
Enable SSH if you need to – can be turned off again after configuration for security.
Put in your own SSO configuration
Click Next
Select or unselect the CEIP
Check all the details and click Finish
A message will pop up
The vCenter Appliance will begin the final installation
When complete, you should see the following screen
You can now connect to the vCenter Appliance on the 5480 port and the Web Client
Note: at this point I actually changed to enable VCHA on my normal first built vCenter called techlabvca001 as I should have added my second vCenter into the same SSO domain as techlabvca001 but I actually set it up as a completely different vCenter so it wouldn’t let me enable VCHA in the way I set it up. Log into the vSphere Web Client for techlabvca001
Highlight vCenter
Click the Configure tab
Choose Basic
Put in the Active vCenters HA address and subnet mask
Choose a port group
Click Next
Select Advanced and change the IP settings to what you want
Passive Node
And the Witness Node
Click Next and you will be on the next screen which allows you to specify what location and datastores you can use to place the nodes
Click Edit on the Passive Node
Select the Compute Resource
Choose a datastore – In my case this will be my vSAN
Check the Compatibilty checks – In my case it is just notifying me about snapshots being lost when this created.
Next adjust the Witness settings – I am not going to go through them all again as they will be the same as the Passive node we just did.
Check the Management network and vCenter HA networks
Next and check the final details and click Finish
It will now say vCenter HA being deployed in the vSphere Web client
You should see a Peer machine and a Witness machine being deployed
Once complete you will see VCHA is enabled and you should see your Active vCenter, Passive vCenter and Witness
Click the Test Failover to check everything is working as expected
You can also place the HA Cluster in several modes
vCenter services use SSL to communicate securely with each other and with ESXi. SSL communications ensure data confidentiality and integrity. Data is protected and cannot be modified in transit without detection.
vCenter Server services such as the vSphere Web Client also use certificates for initial authentication to vCenter Single Sign-On. vCenter Single Sign-On provisions each set of services (solution user) with a SAML token that the solution user can authenticate with.
In vSphere 6.0 and later, the VMware Certificate Authority (VMCA) provisions each ESXi host and each vCenter Server service with a certificate that is signed by VMCA by default.
You can replace the existing certificates with new VMCA-signed certificates, make VMCA a subordinate CA, or replace all certificates with custom certificates. You have several options:
Requirements for imported certificates
Key size: 2048 bits or more (PEM encoded)
PEM format. VMware supports PKCS8 and PKCS1 (RSA keys). When you add keys to VECS, they are converted to PKCS8.
x509 version 3
SubjectAltName must contain DNS Name=machine_FQDN
CRT format
Contains the following Key Usages: Digital Signature, Key Encipherment.
Client Authentication and Server Authentication cannot be present under Enhanced Key Usage
VMCA does not support the following certificates
Certificates with wildcards
The algorithms md2WithRSAEncryption 1.2.840.113549.1.1.2, md5WithRSAEncryption 1.2.840.113549.1.1.4, and sha1WithRSAEncryption 1.2.840.113549.1.1.5 are not recommended.
The algorithm RSASSA-PSS with OID 1.2.840.113549.1.1.10 is not supported.
The work required for setting up or updating your certificate infrastructure depends on the requirements in your environment, on whether you are performing a fresh install or an upgrade, and on whether you are considering ESXi or vCenter Server.
What is the VMCA?
The VMware Certificate Authority (VMCA) is the default root certificate authority introduced in vSphere 6.0 that supplies the certificates to ensure communication over SSL between vCenter Server components and ESXi nodes in the virtualized infrastructure.
The VMCA is included in the Platform Services Controller and provides certificates for
Solution users (Replacing Solution user certificates is not normally required by company policy)
Machines that have running services
ESXi hosts. An ESXi host gets a signed certificate, stored locally on the server, from the VMCA. For environments that require a different root authority, an administrator must change the option in vCenter to stop automatically provisioning VMCA certificates to hosts.
If you do not currently replace VMware certificates, your environment starts using VMCA-signed certificates instead of self-signed certificates.
What is the VECS?
VMware Endpoint Certificate Store (VECS) serves as a local (client-side) repository for certificates, private keys, and other certificate information that can be stored in a keystore. You can decide not to use VMCA as your certificate authority and certificate signer, but you must use VECS to store all vCenter certificates, keys, and so on. ESXi certificates are stored locally on each host and not in VECS. VECS runs on every embedded deployment, Platform Services Controller node, and management node and holds the keystores that contain the certificates and keys.
How does VMCA deal with certificates?
With VMCA you can deal with certificates in three different ways.
VMCA Default
VMCA uses a self-signed root certificate. It issues certificates to vCenter, ESXi, etc and manages these certificates. These certificates have a chain of trust that stops at the VMCA root certificate. VMCA is not a general-purpose CA and its use is limited to VMware components.
VMCA Enterprise
VMCA is used as a subordinate CA and is issued subordinate CA signing certificate. It can now issue certificates that trust up to the enterprise CA’s root certificate. If you have already issued certs using VMCA Default and replace VMCA’s root cert with a CA signing cert then all certificates issued will be regenerated and pushed out to the components
Custom
In this scenario VMCA is completely bypassed. This scenario is for those customers that want to issue and/or install their own certificates from their own internal PKI or third party signed certificates generated from an external PKi such as Verisign or GoDaddy. You will need to issue a cert for every component. All those certs (except for host certs) need to be installed into VECS.
In Default and Enterprise modes, VMCA certificates can be easily regenerated on demand.
Certificate Manager Tool
For vSphere 6, the procedure for installing certificates has changed. A new Certificate Manager Tool is shipped as part of vCenter for Windows and VCSA. The location is below
/usr/lib/vmware-vmca/bin/certificate-manager
Deployments
I’m going to use a custom deployment method to just change machine certs but not ESXi host or Solution certificates.
Hybrid Deployment
You can have VMCA supply some of the certificates, but use custom certificates for other parts of your infrastructure. For example, because solution user certificates are used only to authenticate to vCenter Single Sign-On, consider having VMCA provision those certificates. Replace the machine SSL certificates with custom certificates to secure all SSL traffic.
Company policy often does not allow intermediate CAs. For those cases, hybrid deployment is a good solution. It minimizes the number of certificates to replace and secures all traffic. The hybrid deployment leaves only internal traffic, that is, solution user traffic, to use the default VMCA-signed certificates
Where vSphere uses certificates
ESXi Certificates
Stored locally on each host in the etc/vmware/ssl directory
ESXi certificates are provisioned by VMCA by default when the host is first added to vCenter and the host reconnects, but you can use custom certificates instead
Machine SSL Certificates
The machine SSL certificate for each node is used to create an SSl socket on the server side
SSL clients connect to the SSL socket
Used for server verification and for secure communications such as HTTPS or LDAPS
Each node has its own machine SSL certificate. Nodes include vCenter, Platform Services Controller or embedded deployment instance
VMware products use standard X.509 version 3 (X.509v3) certificates to encrypt session information. Session information is sent over SSL between components.
The following services use the machine SSL certificate
The reverse proxy service on each Platform Services Controller node. SSL connections to individual vCenter services always go to the reverse proxy. Traffic does not go to the services themselves.
The vCenter service (vpxd) on management nodes and embedded nodes.
The VMware Directory Service (vmdir) on infrastructure nodes and embedded nodes.
Solution User Certificates
A solution user encapsulates one or more vCenter Server services. Each solution user must be authenticated to vCenter Single Sign-On. Solution users use certificates to authenticate to vCenter Single
Sign-On through SAML token exchange
A solution user presents the certificate to vCenter Single Sign-On when it first must authenticate, after a reboot, and after a timeout has elapsed. The timeout (Holder-of-Key Timeout) can be set from the vSphere Web Client or Platform Services Controller Web interface and defaults to 2592000 seconds (30 days).
The following solution user certificate stores are included in VECS on each management or embedded node.
Managing certificates with the vSphere Certificate Manager Utility
There are a few ways of managing certificates but I am going to run through the vSphere Certificate Manager Utility.
The vSphere Certificate Manager utility allows you to perform most certificate management tasks interactively from the command line. vSphere Certificate Manager prompts you for the task to perform, for certificate locations and other information as needed, and then stops and starts services and replaces certificates for you.
If you use vSphere Certificate Manager, you are not responsible for placing the certificates in VECS (VMware Endpoint Certificate Store) and you are not responsible for starting and stopping services.
Before you run vSphere Certificate Manager, be sure you understand the replacement process and procure the certificates that you want to use
Certificate Manager Utility Location
Procedure
First of all I need to create a template in my own internal Certificate Authority. I’m going to follow the below article with the steps and screenprints to show what I’m doing.
Connecting to the CA server, you will be generating the certificates from through an RDP session.
Click Start > Run, type certtmpl.msc, and click OK.
In the Certificate Template Console, under Template Display Name, right-click Web Server and click Duplicate Template.
In the Duplicate Template window, select Windows Server 2003 Enterprise for backward compatibility.Note: If you have an encryption level higher than SHA1, select Windows Server 2008 Enterprise.
Click the General Tab
In the Template display name field, enter vSphere 6.x as the name of the new template
Click the Extensions tab.
Select Application Policies and click Edit.
Select Server Authentication and click Remove, then OK.
Note: If Client Authentication exists, remove this from Application Policies as well.
Select Key Usage and click Edit
Select the Signature is proof of origin (nonrepudiation) option. Leave all other options as default
Click OK.
Click on the the Subject Name tab.
Ensure that the Supply in the request option is selected.
Click OK to save the template.
Next: Adding a new template to certificate templates section in the article to make the newly created certificate template available
Connecting to the CA server, you will be generating the certificates from through an RDP session.
Click Start > Run, type certsrv.msc, and click OK.
In the left pane of the Certificate Console, if collapsed, expand the node by clicking the + icon.
Right-click Certificate Templates and click New > Certificate Template to Issue.
Locate vSphere 6.x or vSphere 6.x VMCA under the Name column.
Click OK.
You will then see the certificate template
Next: Create a folder on the VCSA for uploading and downloading certificates
WinSCP into the VCSAs/PSCs and create a folder that you can upload and download to. E.g. /tmp/machine_ssl
Take note of this link before copying any files up through WinSCP – https://kb.vmware.com/s/article/2107727 Commands below for enabling access to the VCSA/PSC below
shell.set –enabled True
shell
chsh -s /bin/bash root
Generate the Certificate signing request
Note: If you have external PSCs then do these first before doing the Centers.
The Machine SSL certificate is the certificate you get when you open the vSphere Web Client in a web browser. It is used by the reverse proxy service on every management node, Platform Services Controller, and embedded deployment. You can replace the certificate on each node with a custom certificate.
In Putty, Navigate to /usr/lib/vmware-vmca/bin/ and run ./certificate-manager
Put in the administrator@vsphere.local account and password
Select Option 1 to Replace Machine SSL certificate with Custom Certificate
Put in the path to the /tmp/machine_ssl folder on the appliance
Enter all the relevant cert info
Output directory path: path where will be generated the private key and the request
Country: your country in two letters
Name: The FQDN of your vCSA
Organization: an organization name
OrgUnit: type the name of your unit
State: country name
Locality: your city
IPAddess: provide the vCSA IP address
Email: provide your E-mail address
Hostname: the FQDN of your vCSA
VMCA Name: the FQDN where is located your VMCA. Usually the vCSA FQDN
You will then see the generated csr and key in the /tmp/machine_ssl folder
Open the vmware_issued_csr.csr file and copy the contents
Next: Request a certificate from your CA.
The next step is to use the CSR to request a certificate from your internal Certificate Authority.
Log in to the Microsoft CA certificate authority Web interface. By default, it is http://CA_Server_FQDN/CertSrv.
Click the Request a certificate (.csr ) link.
Click advanced certificate request.
Click the Submit a certificate request by using a base-64-encoded CMC or PKCS #10 file, or submit a renewal request by using a base-64-encoded PKCS #7 file link.
Open the certificate request (machine_ssl.csr) in a plain text editor and copy from —–BEGIN CERTIFICATE REQUEST—– to —–END CERTIFICATE REQUEST—– into the Saved Request box.
On the download page, Select “Base 64 encoded” and click on “Download Certificate”. The downloaded file will be called “certnew.cer”. Rename this to “machine_ssl.cer”
Go back to the download web page and click on “Download certificate chain” (ensuring that “Base 64 encoded” is still selected). The downloaded file will be called “certnew.pb7”. Rename this to “cachain.pb7”
We are now going to export the CA Root certificate from the cachain.pb7 files. Right-click on the cachain.pb7 file and select “Open”
Expand the list and click on the Certificates folder. Right-click on the CA root cert (techlab-TECHLABDCA001-CA in this example), select All Tasks…Export
Select Base 64 encoded
Save the file as root-64.cer
You should now have the machine_ssl.cer file and the root-64.cer file
Using WinSCP copy the machine_ssl.cer and root-64.cer certificate files to the VCSA.
Now that the files have been copied, open the Certificate Manager Utility and select Option 1, Replace Machine SSL certificate with Custom Certificate.
Provide the password to your administrator@vsphere.local account and select Option 2, “Import Custom Certificate(s) and key(s) to replace existing Machine SSL certificate”
You will be prompted for following files:
machine_ssl.cer
machine_ssl.key
root-64.cer
Type Y to begin the process
It will kick of the install
You should get a message to say that everything is completed
Now check to see if everything has gone to plan. One thing to remember before we start. Because the new Machine SSL cert has been issued by the CA on the domain controller, you may need to install the root-64.cer fle into the browser. Once done, close the browser and log into the vSphere Web Client.
Now open your vCenter login page and check the certificate being used to protect it
Now open your vCenter login page and check the certificate being used to protect it
You’ll see that the certificate has been verified by “techlab-TECHLABADC001-CA”. This is the CA running on the Windows domain controller.
A vSAN Stretched Cluster is a specific configuration implemented in environments where disaster/downtime avoidance is a key requirement. Setting up a stretched cluster can be daunting. More in terms of the networking side than anything else. This blog isn’t meant to be chapter and verse on vSAN stretched clusters. It is meant to help anyone who is setting up the networking, static routes and ports required for a L2 and L3 implementation.
VMware vSAN Stretched Clusters with a Witness Host refers to a deployment where a user sets up a vSAN cluster with 2 active/active sites with an identical number of ESXi hosts distributed evenly between the two sites. The sites are connected via a high bandwidth/low latency link.
The third site hosting the vSAN Witness Host is connected to both of the active/active data-sites. This connectivity can be via low bandwidth/high latency links.
Each site is configured as a vSAN Fault Domain. The way to describe a vSAN Stretched Cluster configuration is X+Y+Z, where X is the number of ESXi hosts at data site A, Y is the number of ESXi hosts at data site B, and Z is the number of witness hosts at site C. Data sites are where virtual machines are deployed. The minimum supported configuration is 1+1+1(3 nodes). The maximum configuration is 15+15+1 (31 nodes). In vSAN Stretched Clusters, there is only one witness host in any configuration.
A virtual machine deployed on a vSAN Stretched Cluster will have one copy of its data on site A, a second copy of its data on site B and any witness components placed on the witness host in site C.
Types of networks
VMware recommends the following network types for Virtual SAN Stretched Cluster:
Management network: L2 stretched or L3 (routed) between all sites. Either option should both work fine. The choice is left up to the customer.
VM network: VMware recommends L2 stretched between data sites. In the event of a failure, the VMs will not require a new IP to work on the remote site
vMotion network: L2 stretched or L3 (routed) between data sites should both work fine. The choice is left up to the customer.
Virtual SAN network: VMware recommends L2 stretched between the two data sites and L3 (routed) network between the data sites and the witness site.
The major consideration when implementing this configuration is that each ESXi host comes with a default TCPIP stack, and as a result, only has a single default gateway. The default route is typically associated with the management network TCPIP stack. The solution to this issue is to use static routes. This allows an administrator to define a new routing entry indicating which path should be followed to reach a particular network. Static routes are needed between the data hosts and the witness host for the VSAN network, but they are not required for the data hosts on different sites to communicate to each other over the VSAN network. However, in the case of stretched clusters, it might also be necessary to add a static route from the vCenter server to reach the management network of the witness ESXi host if it is not routable, and similarly a static route may need to be added to the ESXi witness management network to reach the vCenter server. This is because the vCenter server will route all traffic via the default gateway.
vSAN Stretched Cluster Visio diagram
The below diagram is for referring to and below this, the static routes are listed so it is clear what needs to connect.
Static Routes
The recommended static routes are
Hosts on the Preferred Site have a static route added so that requests to reach the witness network on the Witness Site are routed out the vSAN VMkernel interface
Hosts on the Secondary Site have a static route added so that requests to reach the witness network on the Witness Site are routed out the vSAN VMkernel interface
The Witness Host on the Witness Site have static route added so that requests to reach the Preferred Site and Secondary Site are routed out the WitnessPg VMkernel interface
On each host on the Preferred and Secondary site
These were the manual routes added
esxcli network ip route ipv4 add -n 192.168.1.0/24-n vmk1 -g 172.31.216.1 (192.168.1.0 being the witness vsan network and 172.31.216.1 being the host vsan vmkernel address)
esxcli network ip route ipv4 list will show you the networking
vmkping -I vmk1 192.168.1.10 will confirm via ping that the network is reachable
On the witness
These were the manual routes added
esxcli network ip route ipv4 add -n 172.31.216.0/25 -n vmk1 -g 192.168.1.1 (172.31.216.0/25 being the host vsan vmkernel network and the gateway being the witness vsan vmkernel gateway)
esxcli network ip route ipv4 list will show you the networking
vmkping -I vmk1 172.31.216.10 will confirm via ping that the network is reachable
Port Requirements
Virtual SAN Clustering Service
12345, 23451 (UDP)
Virtual SAN Cluster Monitoring and Membership Directory Service. uses UDP-based IP multicast to establish cluster members and distribute Virtual SAN metadata to all cluster members. If disabled Virtual SAN does not work,
Virtual SAN Transport
2233 (TCP)
Virtual SAN reliable datagram transport. uses TCP and is used for Virtual SAN storage I/O. if disabled, Virtual SAN does not work
vSANVP
8080 (TCP)
vSAN VASA Vendor Provider. Used by the Storage Management Service (SMS) that is part of vCenter to access information about Virtual SAN storage profiles, capabilities and compliance. If disabled, Virtual SAN Storage Profile Based Management does not work
Virtual SAN Unicast agent to witness
12321 (UDP)
Self explanatory as needed for unicast from data nodes to witness.
vSAN Storage Hub
The link below is to the VMware Storage Hub which is the central location for all things vSAN including the vSAN stretched cluster guide which is exportable to PDF. Page 66/67 are relevant to networking/static routes.
Don't think about what can happen in a month. Don't think what can happen in a year. Just focus on the 24 hours in front of you and do what you can to get closer to where you want to be :-)