We had an interesting problem with a 6 host vSAN cluster where 1 host seemed to be in a network partition according to Skyline Health. I thought it would be useful to document our troubleshooting steps as it can come in useful. Our problem wasn’t one of the usual network mis-configurations but in order to reach that conclusion we needed to perform some usual tests
We had removed this host from the vSAN cluster, the HA cluster and removed from the inventory and rebuilt it, then tried adding it back into the vSAN cluster with the other 5 hosts. It let us add the host to the current vSAN Sub-cluster UUID but then partitioned itself from the other 5 hosts.
Usual restart of hostd, vpxa, clomd and vsanmgmtd did not help.
Test 1 – Check each host’s vSAN details
Running the command below will tell you a lot of information on the problem host, in our case techlabesxi1
Straightaway we can see it is partitioned as the Sub-Cluster Member UUIDs should have the other 5 hosts’ UUID in and the Sub-Cluster Member Hostnames should have techlabesxi2, techlabesxi3, techlabesxi4, techlabesxi5, techlabesxi6. It has also made itself a MASTER where as we already have a master with the other partitioned vSAN cluster and there can’t be two masters in a cluster.
Master role:
A cluster should have only one host with the Master role. More than a single host with the Master role indicates a problem
The host with the Master role receives all CMMDS updates from all hosts in the cluster
Backup role:
The host with the Backup role assumes the Master role if the current Master fails
Normally, only one host has the Backup role
Agent role:
Hosts with the Agent role are members of the cluster
Hosts with the Agent role can assume the Backup role or the Master role as circumstances change
In clusters of four or more hosts, more than one host has the Agent role
Test 2 – Can each host ping the other one?
A lot of problems can be caused by the misconfiguration of the vsan vmkernel and/or other vmkernel ports however, this was not our issue. It is worth double checking everything though. IP addresses across the specific vmkernel ports must be in the same subnet.
Get the networking details from each host by using the below command. This will give you the full vmkernel networking details including the IP address, Subnet Mask, Gateway and Broadcast
esxcli network ip interface ipv4 address list
It may be necessary to test VMkernel network connectivity between ESXi hosts in your environment. From the problem host, we tried pinging the other hosts management network.
vmkping -I vmkX x.x.x.x
Where x.x.x.x is the hostname or IP address of the server that you want to ping and vmkX is the vmkernel interface to ping out of.
This was all successful
Test 3 – Check the unicast agent list and check the NodeUUIDs on each host
To get a check on what each host’s nodeUUID is, you can run
esxcli vsan cluster unicastagent list
Conclusion
We think what happened was that the non partitioned hosts had a reference to an old UUID for techlabesx01 due to us rebuilding the host. The host was removed from the vSAN cluster and the HA cluster and completely rebuilt. However, when we removed this host originally, the other hosts did not seem to update themselves once it had gone. So when we tried to add it back in, the other hosts didn’t recognise it.
The Fix
What we had to do was disable ClustermemberListUpdates on each host
This blog is similar to another I wrote which compared VM Encryption and vSAN encryption on ESXi 6.7U3. This time, I’m comparing VM Encryption performance on ESXi 6.7U3 and ESXi 7.0U2 running on vSAN.
What is the problem which needs to be solved?
I have posted this section before on the previous blog however it is important to understand the effect of an extra layer of encryption has on the performance of your systems. It has become a requirement (sometimes mandatory) for companies to enable protection of both personal identifiable information and data; including protecting other communications within and across environments New EU General Data Protection Regulations (GDPR) are now a legal requirement for global companies to protect the personal identifiable information of all European Union residents. In the last year, the United Kingdom has left the EU, however the General Data Protection Regulations will still be important to implement. “The Payment Card Industry Data Security Standards (PCI DSS) requires encrypted card numbers. The Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Acts (HIPAA/HITECH) require encryption of Electronic Protected Health Information (ePHI).” (Townsendsecurity, 2019) Little is known about the effect encryption has on the performance of different data held on virtual infrastructure. VM encryption and vSAN encryption are the two data protection options I will evaluate for a better understanding of the functionality and performance effect on software defined storage.
It may be important to understand encryption functionality in order to match business and legal requirements. Certain regulations may need to be met which only specific encryption solutions can provide. Additionally, encryption adds a layer of functionality which is known to have an effect on system performance. With systems which scale into thousands, it is critical to understand what effect encryption will have on functionality and performance in large environments. It will also help when purchasing hardware which has been designed for specific environments to allow some headroom in the specification for the overhead of encryption
Testing Components
Test lab hardware (8 Servers)
HCIBench Test VMs
80 HCIBench Test VMs will be used for this test. I have placed 10 VMs on each of the 8 Dell R640 servers to provide a balanced configuration. No virtual machines other than the HCIBench test VMs will be run on this system to avoid interference with the testing.
The HCIBench appliance is running vdBench, not Fio
The specification of the 80 HCIBench Test VMs are as follows.
RAID Configuration
VM encryption will be tested on RAID1 and RAID6 vSAN storage
VM encryption RAID1 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid1_vsan_policy Storage Type = vSAN Failures to tolerate = 2 (RAID 1) Thin provisioned = Yes Number of disk stripes per object = 2 Encryption enabled = Yes Deduplication and Compression enabled = No
VM encryption RAID6 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid6_vsan_policy Storage Type = vSAN Failures to tolerate = 2 (RAID6) Thin provisioned = Yes Number of disk stripes per object = 1 Encryption enabled = Yes Deduplication and Compression enabled = No
HCIBench Test Parameters
The test will run through various types of read/write workload at the different block sizes to replicate different types of applications using 1 and 2 threads.
0% Read 100% Write
20% Read 80% Write
70% Read 30% Write
The block sizes used are
4k
16k
64k
128k
The test plan below containing 24 tests will be run for VM Encryption on 6.7U3 and again for VM Encryption on 7.0U2. These are all parameter files which are uploaded in HCIBench then can run sequentially without intervention through the test. I think I left these running for 3 days! It refreshes the cache in between tests.
Scroll across at the bottom to see the whole table
Test
Number of disks
Working Set %
Number of threads
Block size (k)
Read %
Write %
Random %
Test time (s)
1
2 (O/S and Data)
100%
1
4k
0
100
100
7200
2
2 (O/S and Data)
100%
2
4k
0
100
100
7200
3
2 (O/S and Data)
100%
1
4k
20
80
100
7200
4
2 (O/S and Data)
100%
2
4k
20
80
100
7200
5
2 (O/S and Data)
100%
1
4k
70
30
100
7200
6
2 (O/S and Data)
100%
2
4k
70
30
100
7200
7
2 (O/S and Data)
100%
1
16k
0
100
100
7200
8
2 (O/S and Data)
100%
2
16k
0
100
100
7200
9
2 (O/S and Data)
100%
1
16k
20
80
100
7200
10
2 (O/S and Data)
100%
2
16k
20
80
100
7200
11
2 (O/S and Data)
100%
1
16k
70
30
100
7200
12
2 (O/S and Data)
100%
2
16k
70
30
100
7200
13
2 (O/S and Data)
100%
1
64k
0
100
100
7200
14
2 (O/S and Data)
100%
2
64k
0
100
100
7200
15
2 (O/S and Data)
100%
1
64k
20
80
100
7200
16
2 (O/S and Data)
100%
2
64k
20
80
100
7200
17
2 (O/S and Data)
100%
1
64k
70
30
100
7200
18
2 (O/S and Data)
100%
2
64k
70
30
100
7200
19
2 (O/S and Data)
100%
1
128k
0
100
100
7200
20
2 (O/S and Data)
100%
2
128k
0
100
100
7200
21
2 (O/S and Data)
100%
1
128k
20
80
100
7200
22
2 (O/S and Data)
100%
2
128k
20
80
100
7200
23
2 (O/S and Data)
100%
1
128k
70
30
100
7200
24
2 (O/S and Data)
100%
2
128k
70
30
100
7200
HCIBench Performance Metrics
These metrics will be measured across all tests
Workload Parameter
Explanation
Value
IOPs
IOPS measures the number of read and write operations per second
Input/Outputs per second
Throughput
Throughput measures the number of bits read or written per second Average IO size x IOPS = Throughput in MB/s
MB/s
Read Latency
Latency is the response time when you send a small I/O to a storage device. If the I/O is a data read, latency is the time it takes for the data to come back
ms
Write Latency
Latency is the response time when you send a small I/O to a storage device. If the I/O is a write, latency is the time for the write acknowledgement to return.
ms
Latency Standard Deviation
Standard deviation is a measure of the amount of variation within a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range
Values must be compared to the standard deviation
Average ESXi CPU usage
Average ESXi Host CPU usage
%
Average vSAN CPU usage
Average CPU use for vSAN traffic only
%
Results
IOPs
IOPS measures the number of read and write operations per second. The pattern for the 3 different tests is consistent where the heavier write tests show the least IOPs gradually increasing in IOPs as the writes decrease.
IOPS and block size tend to have an inverse relationship. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS
With RAID1 VM Encryption, 7.0U2 performs better than 6.7U3 at the lower block level – 4k and 16k but as we get into the larger 64k and 128k blocks, there is less of a difference with 6.7U3 having the slight edge over IOps performance.
With RAID6 VM Encryption, 7.0U2 has consistently higher IOPS across all tests than 6.7U3.
RAID6 VM Encryption produces less IOPs than RAID1 VM Encryption which is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity) Each write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity.
RAID 1 VM Encryption
The graph below shows the comparison of IOPs between 6.7U3 and 7.0U2 with RAID 1 VM Encryption
Click the graph for an enlarged view
RAID 6 VM Encryption
The graph below shows the comparison of IOPs between 6.7U3 and 7.0U2 with RAID6 VM Encryption
Click the graph for an enlarged view
Throughput
IOPs and throughput are closely related by the following equation.
Throughput (MB/s) = IOPS * Block size
IOPS measures the number of read and write operations per second, while throughput measures the number of bits read or written per second. The higher the throughput, the more data which can be transferred. The graphs follow a consistent pattern from the heavier to the lighter workload tests. I can see the larger block sizes such as 64K and 128K have the greater throughput in each of the workload tests than 4K or 8K. As the block sizes get larger in a workload, the number of IOPS will decrease. Even though it’s fewer IOPS, you’re getting more data throughput because the block sizes are bigger. The vSAN datastore is a native 4K system. It’s important to remember that storage systems may be optimized for different block sizes. It is often the operating system and applications which set the block sizes which then run on the underlying storage. It is important to test different block sizes on storage systems to see the effect these have.
With RAID1 VM Encryption at at lower block sizes, 4k and 16k, 7.0U2 performs better with greater throughput. At the higher block sizes 64k and 128k, there is less of a difference with 6.7U3 performing slightly better but the increase is minimal.
With RAID6 VM Encryption, there is generally a higher throughput at the lower block sizes but not at the higher block sizes
RAID1 VM Encryption
The graph below shows the comparison of throughput between 6.7U3 and 7.0U2 with RAID1 VM Encryption
Click the graph for an enlarged view
RAID6 VM Encryption
The graph below shows the comparison of throughput between 6.7U3 and 7.0U2 with RAID6 VM Encryption
Click the graph for an enlarged view
Average Latency
With RAID1 VM Encryption at at lower block sizes, 4k and 16k, 7.0U2 shows less latency but at the higher block sizes there is a slight increase in latency than 6.7U3
With RAID6 VM Encryption, the 7.0U2 tests are better showing less latency than the 6.7U3 tests
RAID1 VM Encryption
The graph below shows the comparison of average latency between 6.7U3 and 7.0U2 with RAID1 VM Encryption
Click the graph for an enlarged view
RAID6 VM Encryption
The graph below shows the comparison of average latency between 6.7U3 and 7.0U2 with RAID6 VM Encryption
Click the graph for an enlarged view
Read Latency
The pattern is consistent between the read/write workloads. As the workload decreases, read latency decreases although the figures are generally quite close. Read latency for all tests varies between 0.30 and 1.40ms which is under a generally recommended limit of 15-20ms before latency starts to cause performance problems.
RAID1 VM Encryption shows lower read latency for the 7.0U2 tests than 6.7U3. There are outlier values for the Read Latency across the 4K and 16K block size when testing 2 threads which may be something to note if applications will be used at these block sizes.
RAID6 shows a slightly better latency result than RAID1 however RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency. Overall 7.0U2 performs better than 6.7U3 apart from one value at the 128k block size with 2 threads which may be an outlier.
RAID1 VM Encryption
Click the graph for an enlarged view
RAID6 VM Encryption
Click the graph for an enlarged view
Write Latency
The lowest write latency is 0.72ms and the largest is 9.56ms. Up to 20ms is the recommended value from VMware however with all flash arrays, thse values are expected and well within these limits. With NVMe and flash disks, the faster hardware may expose bottlenecks elsewhere in hardware stack and architecture which can be compared with internal VMware host layer monitoring. Write latency can occur at several virtualization layers and filters which each cause their own latency. The layers can be seen below.
Latency can be caused by limits on the storage controller, queuing at the VMkernel layer, the disk IOPS limit being reached and the types of workloads being run possibly alongside other types of workloads which cause more processing.
With RAID1 Encryption, 7.0U2 performed better at the lower block size with less write latency than 6.7U3. However on the higher block sizes, 64k and 128k, 6.7U3 performs slightly better but we are talking 1-2ms.
With RAID6 VM Encryption, 7.0U2 performed well with less latency across all tests than 6.7U3.
As expected, all the RAID6 results incurred more write latency than the RAID1 results. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency
RAID1 VM Encryption
Click the graph for an enlarged view
RAID6 VM Encryption
Click the graph for an enlarged view
Latency Standard Deviation
The standard deviation value in the testing results uses a 95th percentile. This is explained below with examples.
An average latency of 2ms and a 95th percentile of 6ms means that 95% of the IO were serviced under 6ms, and that would be a good result
An average latency of 2ms and a 95th percentile latency of 200ms means 95% of the IO were serviced under 200ms (keeping in mind that some will be higher than 200ms). This means that latencies are unpredictable and some may take a long time to complete. An operation could take less than 2ms, but every once in a while, it could take well over 200
Assuming a good average latency, it is typical to see the 95th percentile latency no more than 3 times the average latency.
With RAID1 Encryption, 7.0U2 performed better at the lower block size with less latency standard deviation than 6.7U3. However on the higher block sizes, 64k and 128k, 6.7U3 performs slightly better.
With RAID 6 VM Encryption, 7.0U2 performed with less standard deviation across all the tests.
RAID1 VM Encryption
Click the graph for an enlarged view
RAID6 VM Encryption
Click the graph for an enlarged view
ESXi CPU Usage %
With RAID1 VM Encryption, at the lower block sizes, 4k and 16k, 7.0U2 uses more CPU but at the higher block sizes, 7.0U2 uses slightly less CPU usage.
With RAID6 VM Encryption, there is an increase in CPU usage across all 7.0U2 compared to 6.7U3 tests. RAID 6 has a higher computational penalty than RAID1.
RAID1 VM Encryption
Click the graph for an enlarged view
RAID6 VM Encryption
Click the graph for an enlarged view
Conclusion
The performance tests were designed to get an overall view from a low workload test of 30% Write, 70% Read through a series of increasing workload tests of 80% Write, 20% Read and 100% Write, 0% Read simulation. These tests used different block sizes to simulate different application block sizes. Testing was carried out on an all flash RAID1 and RAID6 vSAN datastore to compare the performance for VM encryption between ESXi 6.7U3 and 7.0U2. The environment was set up to vendor best practice across vSphere ESXi, vSAN, vCenter and the Dell server configuration.
RAID1 VM Encryption
With 6.7U3, IOPs at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not at lower block sizes.
With 6.7U3, throughput at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not at lower block sizes
Overall latency for 6.7U3 at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not for the lower block size
Read latency for 6.7U3 is higher than 7.0U2.
Write latency at the higher block sizes, 64k and 128k can be slightly better than 7.0U2 but not for the lower block sizes.
There is more standard deviation for 6.7U3 then 7.0U2.
At the lower blocks sizes, 6.7U3 uses less CPU on the whole but at the higher block sizes, 7.0U2 uses less CPU
RAID6 VM Encryption
There are higher IOPs for 7.0U2 than 6.7U3 across all tests.
There is generally a higher throughput for 7.0U2 at the lower block sizes, than 6.7U3 but not at the higher block sizes. However, the difference is minimal.
There is lower overall latency for 7.0U2 than 6.7U3 across all tests
There is lower read latency for 7.0U2 than 6.7U3 across all tests
There is lower write latency for 7.0U2 than 6.7U3 across all tests
There is less standard deviation for 7.0U2 than 6.7U3 across all tests
There is a higher CPU % usage for 7.0U2 than 6.7U3 across all tests
With newer processors, AES improvements, memory improvements, RDMA NICs and storage controller driver improvements, we may see further performance improvements in new server models.
An ESXi image (Download from myvmware.com)and use the depot zip
VMware PowerCLIand the ESXi Image Builder module
For more information on setting this up, see this blog. Thanks to Michelle Laverick.
Other software depots
The vSphere ESXi depot is the main software depot you will need but there are other depots provided by vendors who create collections of VIBs specially packaged for distribution. Depots can be Online and Offline. An online software depot is accessed remotely using the HTTP protocol. An offline software depot is downloaded and accessed locally. These depots have the vendor specific VIBs that you will need to combine with the vSphere ESXi depot in order to create your custom installation image. An example could be HP’s depot on this link
What are VIBS?
VIB actually stands for vSphere Installation Bundle. It is basically a collection of files packaged into a single archive to facilitate distribution. It is composed of 3 parts
A file archive (The files which will be installed on the host)
An xml descriptor file (Describes the contents of the VIB. It contains the requirements for installing the VIB and identifies who created the VIB and the amount of testing that’s been done including any dependencies, any compatibility issues, and whether the VIB can be installed without rebooting.)
A signature file (Verifies the acceptance level of the VIB) There are 4 acceptance levels. See next paragraph
Acceptance levels
Each VIB is released with an acceptance level that cannot be changed. The host acceptance level determines which VIBs can be installed to a host.
VMwareCertfied
The VMwareCertified acceptance level has the most stringent requirements. VIBs with this level go through thorough testing fully equivalent to VMware in-house Quality Assurance testing for the same technology. Today, only I/O Vendor Program (IOVP) program drivers are published at this level. VMware takes support calls for VIBs with this acceptance level.
VMwareAccepted
VIBs with this acceptance level go through verification testing, but the tests do not fully test every function of the software. The partner runs the tests and VMware verifies the result. Today, CIM providers and PSA plug-ins are among the VIBs published at this level. VMware directs support calls for VIBs with this acceptance level to the partner’s support organization.
PartnerSupported
VIBs with the PartnerSupported acceptance level are published by a partner that VMware trusts. The partner performs all testing. VMware does not verify the results. This level is used for a new or nonmainstream technology that partners want to enable for VMware systems. Today, driver VIB technologies such as Infiniband, ATAoE, and SSD are at this level with nonstandard hardware drivers. VMware directs support calls for VIBs with this acceptance level to the partner’s support organization.
CommunitySupported
The CommunitySupported acceptance level is for VIBs created by individuals or companies outside of VMware partner programs. VIBs at this level have not gone through any VMware-approved testing program and are not supported by VMware Technical Support or by a VMware partner.
Steps to create an custom ESXi image
I have an ESXI 7.0U1c software depot zip file and I am going to use an Intel VIB which I will add into the custom image
2. Open PowerCLI and connect to your vCenter
Connect-VIServer <vCenterServer>
3. Next I add my vSphere ESXi and Intel software depot zips
4. If you want to check what packages are available once the software depots have been added.
Get-EsxSoftwarePackage
5. Next we can check what image profiles are available. We are going to clone one of these profiles
Get-EsxImageProfile
6. There are two ways to create a new image profile, you can create an empty image profile and manually specify the VIBs you want to add, or you can clone an existing image profile and use that. I have cloned an existing image profile
If I do a Get-EsxImageProfile now, I can see the new image profile I created
7. Next, I’ll use the Add-EsxSoftwarePackage to add and remove VIBs to/from the image profile. First of all I’ll check my extra Intel package to get the driver name then I will add the software package
Get-EsxSoftwarePackage | where {$_.Vendor -eq “INT”}
9. Just as a note, If you need to change the acceptance level, then you can do so by running the following command before creating the iso or zip. The example below shows changing the imageprofile to the PartnerSupport acceptance level.
It has become a requirement for companies to enable protection of both personal identifiable information and data; including protecting other communications within and across environments New EU General Data Protection Regulations (GDPR) are now a legal requirement for global companies to protect the personal identifiable information of all European Union residents. In the last year, the United Kingdom has left the EU, however the General Data Protection Regulations will still be important to implement. “The Payment Card Industry Data Security Standards (PCI DSS) requires encrypted card numbers. The Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Acts (HIPAA/HITECH) require encryption of Electronic Protected Health Information (ePHI).” (Townsendsecurity, 2019) Little is known about the effect encryption has on the performance of different data held on virtual infrastructure. VM encryption and vSAN encryption are the two data protection options I will evaluate for a better understanding of the functionality and performance effect on software defined storage.
It may be important to understand encryption functionality in order to match business and legal requirements. Certain regulations may need to be met which only specific encryption solutions can provide. Additionally, encryption adds a layer of functionality which is known to have an effect on system performance. With systems which scale into thousands, it is critical to understand what effect encryption will have on functionality and performance in large environments. It will also help when purchasing hardware which has been designed for specific environments to allow some headroom in the specification for the overhead of encryption.
What will be used to test
Key IT Aspects
Description
VMware vSphere ESXi servers
8 x Dell R640 ESXi servers run the virtual lab environment and the software defined storage.
HCIBench test machines
80 x Linux Photon 1.0 virtual machines.
vSAN storage
Virtual datastore combining all 8 ESXi server local NVMe disks. The datastore uses RAID (Redundant array of inexpensive disks), a technique combining multiple disks together for data redundancy and performance.
Key Encryption Management Servers
Clustered and load balanced Thales key management servers for encryption key management.
Encryption Software
VM encryption and vSAN encryption
Benchmarking software
HCIBench v2.3.5 and Oracle Vdbench
Test lab hardware
8 servers
Architecture
Details
Server Model
Dell R640 1U rackmount
CPU Model
Intel Xeon Gold 6148
CPU count
2
Core count
20 per CPU
Processor AES-NI
Enabled in the BIOS
RAM
768GB (12 x 64GB LRDIMM)
NIC
Mellanox ConnectX-4 Lx Dual Port 25GbE rNDC
O/S Disk
1 x 240GB Solid State SATADOM
vSAN Data Disk
3 x 4TB U2 Intel P4510 NVMe
vSAN Cache Disk
1 x 350GB Intel Optane P4800X NVMe
Physical switch
Cisco Nexus N9K-C93180YC-EX
Physical switch ports
48 x 25GbE and 4 x 40GbE
Virtual switch type
VMware Virtual Distributed Switch
Virtual switch port types
Elastic
HCIBench Test VMs
80 HCIBench Test VMs will be used for this test. I have placed 10 VMs on each of the 8 Dell R640 servers to provide a balanced configuration. No virtual machines other than the HCIBench test VMs will be run on this system to avoid interference with the testing.
The specification of the 80 HCIBench Test VMs are as follows.
Resources
Details
CPU
4
RAM
8GB
O/S VMDK primary disk
16GB
Data VMDK disk
20GB
Network
25Gb/s
HCIBench Performance Metrics
Workload Parameter
Explanation
Value
IOPs
IOPS measures the number of read and write operations per second
Input/Outputs per second
Throughput
Throughput measures the number of bits read or written per second Average IO size x IOPS = Throughput in MB/s
MB/s
Read Latency
Latency is the response time when you send a small I/O to a storage device. If the I/O is a data read, latency is the time it takes for the data to come back
ms
Write Latency
Latency is the response time when you send a small I/O to a storage device. If the I/O is a write, latency is the time for the write acknowledgement to return.
ms
Latency Standard Deviation
Standard deviation is a measure of the amount of variation within a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range
Values must be compared to the standard deviation
Average ESXi CPU usage
Average ESXi Host CPU usage
%
Average vSAN CPU usage
Average CPU use for vSAN traffic only
%
HCIBench Test Parameter Options
The HCIBench performance options allow you to set the block size and the types of read/write ratios. In these tests, I will be using the following block sizes to give a representation of the different types of applications you can see on corporate systems
4k
16k
64k
128k
In these tests I will be using the following Read/Write ratios to also give a representation of the different types of applications you can see on corporate systems
0% Read 100% Write
20% Read 80% Write
70% Read 30% Write
RAID Configuration
VM encryption will be tested on RAID1 and RAID6 vSAN storage
vSAN encryption will be tested on RAID1 and RAID6 vSAN storage
Note: vSAN encryption is not configured at all in the policy for vSAN encryption as this is turned on at the datastore level but we still need a generic RAID1 and RAID6 storage policy.
VM encryption RAID1 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid1_vsan_policy Storage Type = vSAN Failures to tolerate = 1 (RAID 1) Thin provisioned = Yes Number of disk stripes per object = 1 Encryption enabled = Yes Deduplication and Compression enabled = No
VM encryption RAID6 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid6_vsan_policy Storage Type = vSAN Failures to tolerate = 2 (RAID6) Thin provisioned = Yes Number of disk stripes per object = 1 Encryption enabled = Yes Deduplication and Compression enabled = No
vSAN encryption RAID1 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid1_vsan_policy Storage Type = vSAN Failures to tolerate = 1 (RAID 1) Thin provisioned = Yes Number of disk stripes per object = 1 Deduplication and Compression enabled = No
vSAN encryption RAID6 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid6_vsan_policy Storage Type = vSAN Failures to tolerate = 2 (RAID6) Thin provisioned = Yes Number of disk stripes per object = 1 Deduplication and Compression enabled = No
Test Plans
The table below shows one individual test plan I have created. This plan is replicated for each of the tests listed below. Scroll across at the bottom to see the whole table.
RAID1 Baseline
RAID1 VM Encryption
RAID1 vSAN Encryption
RAID6 Baseline
RAID6 VM Encryption
RAID6 vSAN Encryption
The tests were run for 3 hours each including a warm up and warm down period.
Test
Number of disks
Working Set %
Number of threads
Block size (k)
Read %
Write %
Random %
Test time (s)
1
2 (O/S and Data)
100%
1
4k
0
100
100
7200
2
2 (O/S and Data)
100%
2
4k
0
100
100
7200
3
2 (O/S and Data)
100%
1
4k
20
80
100
7200
4
2 (O/S and Data)
100%
2
4k
20
80
100
7200
5
2 (O/S and Data)
100%
1
4k
70
30
100
7200
6
2 (O/S and Data)
100%
2
4k
70
30
100
7200
7
2 (O/S and Data)
100%
1
16k
0
100
100
7200
8
2 (O/S and Data)
100%
2
16k
0
100
100
7200
9
2 (O/S and Data)
100%
1
16k
20
80
100
7200
10
2 (O/S and Data)
100%
2
16k
20
80
100
7200
11
2 (O/S and Data)
100%
1
16k
70
30
100
7200
12
2 (O/S and Data)
100%
2
16k
70
30
100
7200
13
2 (O/S and Data)
100%
1
64k
0
100
100
7200
14
2 (O/S and Data)
100%
2
64k
0
100
100
7200
15
2 (O/S and Data)
100%
1
64k
20
80
100
7200
16
2 (O/S and Data)
100%
2
64k
20
80
100
7200
17
2 (O/S and Data)
100%
1
64k
70
30
100
7200
18
2 (O/S and Data)
100%
2
64k
70
30
100
7200
19
2 (O/S and Data)
100%
1
128k
0
100
100
7200
20
2 (O/S and Data)
100%
2
128k
0
100
100
7200
21
2 (O/S and Data)
100%
1
128k
20
80
100
7200
22
2 (O/S and Data)
100%
2
128k
20
80
100
7200
23
2 (O/S and Data)
100%
1
128k
70
30
100
7200
24
2 (O/S and Data)
100%
2
128k
70
30
100
7200
Results
Click on the graphs for a larger view
IOPS comparison for all RAID1 and RAID6 tests
IOPS measures the number of read and write operations per second. The pattern for the 3 different tests is consistent where the heavier write tests show the least IOPs gradually increasing in IOPs as the writes decrease. IOPS and block size tend to have an inverse relationship. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS.
It is clear to see from the graphs that RAID1 VM encryption and RAID1 vSAN encryption produces more IOPS for all tests than RAID6 VM encryption and RAID6 vSAN encryption. This is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity)
Each write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity.
RAID1 VM encryption outperforms RAID1 vSAN encryption in terms of IOPs. The RAID6 results are interesting where at the lower block sizes, RAID6 VM encryption outperforms RAID6 vSAN encryption however at the higher block sizes, RAID6 vSAN encryption outperforms VM encryption.
In order of the highest IOPs
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 VM encryption
RAID 6 vSAN encryption
Throughput comparison for all RAID1 and RAID6 tests
IOPs and throughput are closely related by the following equation.
Throughput (MB/s) = IOPS * Block size
IOPS measures the number of read and write operations per second, while throughput measures the number of bits read or written per second. The higher the throughput, the more data which can be transferred. The graphs follow a consistent pattern from the heavier to the lighter workload tests. I can see the larger block sizes such as 64K and 128K have the greater throughput in each of the workload tests than 4K or 8K. As the block sizes get larger in a workload, the number of IOPS will decrease. Even though it’s fewer IOPS, you’re getting more data throughput because the block sizes are bigger. The vSAN datastore is a native 4K system. It’s important to remember that storage systems may be optimized for different block sizes. It is often the operating system and applications which set the block sizes which then run on the underlying storage. It is important to test different block sizes on storage systems to see the effect these have.
RAID1 VM encryption has the best performance in terms of throughput against RAID1 vSAN encryption however the results are very close together.
RAID6 vSAN encryption has the best performance in terms of throughput against RAID6 VM encryption.
In order of highest throughput
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Read Latencycomparison for all RAID1 and RAID6 tests
The pattern is consistent between the read/write workloads. As the workload decreases, read latency decreases although the figures are generally quite close. Read latency for all tests varies between 0.40 and 1.70ms which is under a generally recommended limit of 15ms before latency starts to cause performance problems.
There are outlier values for the Read Latency across RAID1 VM Encryption and RAID1 vSAN encryption at 4K and 16K when testing 2 threads which may be something to note if applications will be used at these block sizes.
RAID1 vSAN encryption incurs a higher read latency in general than RAID1 VM encryption and RAID6 VM encryption incurs a higher read latency in general than RAID6 vSAN encryption however the figures are very close for all figures from the baseline.
RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency.
From the lowest read latency to the highest
RAID6 vSAN encryption
RAID6 VM encryption
RAID1 VM encryption
RAID1 vSAN encryption
Write latency comparison for all RAID1 and RAID6 tests
The lowest write latency is 0.8ms and the largest is 9.38ms. Up to 20ms is the recommended value from VMware however with all flash arrays, this should be significantly lower which is what I can see from the results. With NVMe and flash disks, the faster hardware may expose bottlenecks elsewhere in hardware stack and architecture which can be compared with internal VMware host layer monitoring. Write latency can occur at several virtualization layers and filters which each cause their own latency. The layers can be seen below.
Latency can be caused by limits on the storage controller, queuing at the VMkernel layer, the disk IOPS limit being reached and the types of workloads being run possibly alongside other types of workloads which cause more processing.
The set of tests at the 100% write/0% read and 80% write/20% read have nearly no change in the write latency but it does decrease more significantly for the 30% write/70% read test.
As expected, all the RAID6 results incurred more write latency than the RAID1 results. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency.
When split into the RAID1 VM encryption and RAID1 vSAN encryption results, RAID1 VM encryption incurs less write latency than RAID1 vSAN encryption however the values are very close.
When split into the RAID6 VM encryption and RAID6 vSAN encryption results, RAID6 VM encryption seems to perform with less write latency at the lower block sizes however performs with more write latency at the higher block sizes than RAID6 vSAN encryption.
From the lowest write latency to the highest.
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Latency Standard Deviation comparison for all RAID1 and RAID6 tests
The standard deviation value in the testing results uses a 95th percentile. This is explained below with examples.
An average latency of 2ms and a 95th percentile of 6ms means that 95% of the IO were serviced under 6ms, and that would be a good result
An average latency of 2ms and a 95th percentile latency of 200ms means 95% of the IO were serviced under 200ms (keeping in mind that some will be higher than 200ms). This means that latencies are unpredictable and some may take a long time to complete. An operation could take less than 2ms, but every once in a while, it could take well over 200
Assuming a good average latency, it is typical to see the 95th percentile latency no more than 3 times the average latency.
I analysed the results to see if the 95th percentile latency was no more than 3 times the average latency for all tests. I added new columns for multiplying the latency figures for all tests by 3 then comparing this to the standard deviation figure. The formula for these columns was =sum(<relevant_latency_column*3)
In the 80% write, 20% read test for the 64K RAID1 Baseline there was one result which was more than 3 times the average latency however not by a significant amount. In the 30% write, 70% read test for the 64K RAID6 Baseline, there were two results which were more than 3 times the average latency however not by a significant amount.
For all the RAID1 and RAID6 VM encryption and vSAN encryption tests, all standard deviation results overall were less than 3 times the average latency indicating that potentially, AES-NI may give encryption a performance enhancement which prevents significant latency deviations.
ESXi CPU usage comparison for all RAID1 and RAID6 tests
I used a percentage change formula on the ESXi CPU usage data for all tests. Percentage change differs from percent increase and percent decrease formulas because both directions of the change (Negative or positive) are seen. VMware calculated that using a percentage change formula, that VM encryption added up to 20% overhead to CPU usage (This was for an older vSphere O/S). There are no figures for vSAN encryption from VMware so I have used the same formula for all tests. I used the formula below to calculate the percentage change for all tests.
% change = 100 x (test value – baseline value)/baseline value
The lowest percentage change is -7.73% and the highest percentage change is 18.37% so the tests are all within VMware’s recommendation that encryption can add up to 20% more server CPU usage. Interestingly when the figures are negative, it shows an improvement over the baseline. This could be due to the way AES-NI boosts performance when encryption is enabled. RAID6 VM Encryption and vSAN encryption show more results which outperformed the baseline in these tests than RAID1 VM Encryption and vSAN encryption.
What is interesting about the RAID1 vSAN encryption and RAID6 vSAN encryption figures is that RAID1 vSAN encryption CPU usage goes up between 1 and 2 threads however RAID6 vSAN encryption CPU usage goes down between 1 and 2 threads.
Overall, there is a definite increase in CPU usage when VM encryption or vSAN encryption is enabled for both RAID1 and RAID6 however from looking at graphs, the impact is minimal even at the higher workloads.
RAID6 VM encryption uses less CPU at the higher block sizes than RAID6 vSAN encryption.
From the lowest ESXi CPU Usage to the highest.
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 VM encryption
RAID1 vSAN encryption
vSAN CPU usage comparison for all RAID1 and RAID6 tests
For the vSAN CPU usage tests. I used a percentage change formula on the data for the vSAN CPU usage comparison tests. Percentage change differs from percent increase and percent decrease formulas because I can see both directions of the change (Negative or positive) Negative values indicate the vSAN CPU usage with encryption performed better than the baseline. VMware calculated that using a percentage change formula, that VM encryption would add up to 20% overhead. There are no figures for vSAN encryption from VMware so I have used the same formula for these tests also.
% change = 100 x (test value – baseline value)/baseline value
The lowest percentage change is -21.88% and the highest percentage change is 12.50% so the tests are all within VMware’s recommendation that encryption in general can add up to 20% more CPU usage. Interestingly when the figures are negative, it shows an improvement over the baseline. This could be due to the way AES-NI boosts performance when encryption is enabled.
RAID1 VM encryption and RAID1 vSAN encryption uses more vSAN CPU than RAID6 VM encryption and RAID6 vSAN encryption. All of the RAID6 VM encryption figures performed better than the RAID6 baseline with the majority of RAID6 vSAN encryption figures performing better than the baseline. In comparison RAID1 VM encryption and RAID1 vSAN encryption nearly always used more CPU than the RAID1 baseline.
From the lowest vSAN CPU usage to the highest.
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 vSAN encryption
RAID1 VM encryption
Conclusion
The following pages provide a final conclusion on the comparison between the functionality and performance of VM Encryption and vSAN Encryption.
Functionality
The main functionality differences can be summed up as follows
The DEK key is stored encrypted in the VMX file/VM advanced settings.
vSAN and VM encryption use the exact same encryption and kmip libraries but they have very different profiles. VM Encryption is a per-VM encryption.
VM Encryption utilizes the vCenter server for key management server key transfer. The hosts do not contact the key management server. vCenter only is a licensed key management client reducing license costs.
Enabled on a virtual cluster datastore level. Encryption is happening at different places in the hypervisor’s layers.
Data travels unencrypted, but it is written encrypted to the cache layer.
Full compatibility with deduplication and compression.
More complicated to set up with a key management server as each vendor has a different way of managing the trust between the key management server the vCenter Server.
The DEK key is stored encrypted in metadata on each disk.
vSAN and VM encryption use the exact same libraries but they have very different profiles.
VM Encryption utilizes the vCenter server for key management server key transfer. The hosts do not contact the key management server. vCenter only is a licensed key management client reducing license costs.
vSAN only, no other storage is able to be used for vSAN encryption.
Functionality conclusion
VM encryption and vSAN encryption are similar in some functionality. Both use a KMS server, both support RAID1, RAID5 and RAID6 encryption and both use the same encryption libraries and the kmip protocol. However, there are some fundamental differences. VM encryption gives the flexibility of encrypting individual virtual machines on a datastore opposed to encrypting a complete datastore with vSAN encryption where all VMs will automatically be encrypted. Both solutions provide data at rest encryption but only VM encryption provides end to end encryption as it writes an encrypted data stream whereas vSAN encryption receives an unencrypted data stream and encrypts it during the write process. Due to this level at which data is encrypted at, VM encryption cannot be used with features such as deduplication and compression however vSAN encryption can. It depends if this functionality is required and if the space which could be saved was significant. VM encryption is datastore independent and can use vSAN, NAS, FC and iSCSi datastores. vSAN encryption can only be used on virtual machines on a vSAN datastore. Choosing the encryption depends on whether different types of storage reside in the environment and whether they require encryption.
The choice between VM encryption functionality and vSAN encryption functionality will be on a use case dependency of whether individual virtual machine encryption control is required and/or whether there is other storage in an organization targeted for encryption. If this is the case, VM encryption will be best. If these factors are not required and deduplication and compression are required, then vSAN encryption is recommended.
Performance conclusion
The performance tests were designed to get an overall view from a low workload test of 30% Write, 70% Read through a series of increasing workload tests of 80% Write, 20% Read and 100% Write, 0% Read simulation. These tests used different block sizes to simulate different application block sizes. Testing was carried out on an all flash RAID1 and RAID6 vSAN datastore to compare the performance for VM encryption and vSAN encryption. The environment was set up to vendor best practice across vSphere ESXi, vSAN, vCenter and the Dell server configuration.
It can be seen in all these tests that performance is affected by the below factors.
Block size.
Workload ratios.
RAID level.
Threads used
Application configuration settings.
Access pattern of the application.
The table below shows a breakdown of the performance but in some cases the results are very close
Metric
1st
2nd
3rd
4th
IOPs
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 VM encryption
RAID6 vSAN encryption
Throughput
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Read Latency
RAID6 vSAN encryption
RAID6 VM encryption
RAID1 VM encryption
RAID1 vSAN encryption
Write Latency
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Standard Dev
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
ESXi CPU Usage
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 VM encryption
RAID1 vSAN encryption
vSAN CPU Usage
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 vSAN encryption
RAID1 VM encryption
In terms of IOPs, RAID1 VM encryption produces the highest IOPS for all tests. This is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity) causing more latency decreasing the IOPs.
In terms of throughput, RAID1 VM encryption produces the highest throughput for all tests. It is expected that by producing the highest IOPs in the majority of tests would mean it would produce a similar result for the throughput. Depending on whether your environment needs larger IOPs or larger throughput depends on the block sizing. Larger block sizes produce the best throughput due to getting more data through the system in bigger blocks. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS.
In terms of read latency, RAID6 vSAN encryption performed best in the read latency tests. Read latency for all tests varies between 0.40 and 1.70ms which is under a generally recommended limit of 15ms before latency starts to cause performance problems. RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency. The values overall were very close.
In terms of write latency, RAID1 VM encryption performed best. All the RAID6 results incurred more write latency than the RAID1 results which was to be expected. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency. The lowest write latency is 0.8ms and the largest is 9.38ms. Up to 20ms is the recommended value therefore all tests were well within acceptable limits.
The performance of encrypted data also seems to be enhanced by the use of newer flash disks like SSDs and NVME showing latency figures which were within the acceptable values. SSD and NVMe uses a streamlined lightweight protocol compared to SAS, SCSI and AHC protocols while also reducing CPU cycles.
In terms of standard deviation, all standard deviation test results were less than 3 times the average latency which is recommended.
In terms of average ESXi CPU and vSAN CPU usage, RAID6 VM encryption produced the lowest increase in CPU. All encryption appeared to be enhanced by leveraging the AES-NI instructions in Intel and AMD CPU’s. The increase in CPU usage by the hosts and vSAN compared to the baseline for both sets of encryption tests is minimal and within acceptable margins by a considerable amount. In some cases, there was lower CPU use than the baseline possibly due to the AES-NI offload.
Encryption recommendation
Overall RAID1 VM encryption produces the best IOPs, throughput and write latency including the standard deviation metric values for latency being well under the acceptable limits. RAID1 ESXi CPU usage and vSAN CPU usage is higher than RAID6 however the difference is minimal when looking at the graphs especially in some cases where both sets of tests can outperform the baseline across the different block sizes. For applications which need very fast read performance, RAID6 will always be the best option due to having more disks than mirrored RAID1 disks to read from therefore this encryption should be matched to a specific application requirement if reads are a priority.
In vSphere 6.7 releases and older, the UMDS 6.7 is bundled with the vCenter Server Appliance installer. You can use the UMDS bundle from the vCenter Server Appliance to install UMDS 6.7 on a separate Linux-based system.
UMDS is a 64-bit application and requires a 64-bit Linux-based system.
You cannot upgrade UMDS that runs on a Linux-based operating system. You can uninstall the current version of UMDS, perform a fresh installation of UMDS according to all system requirements, and use the existing patch store from the UMDS that you uninstalled.
Supported Operating Systems
The Update Manager Download Service (UMDS) can run on a limited number of Linux-based operating systems.
Once this is built, edit the settings and attach the iso to the Ubuntu CD drive via the datastore
On the “Virtual Hardware” tab, expand “CPU,” and select “Expose hardware assisted virtualization to guest OS.”
Right-click your newly created VM and select “Power On.”
Right-click the VM again and select “Open Console.”
The Ubuntu installation process should begin automatically and the first prompt is to choose a language. Select English and press Enter
Highlight “Install Ubuntu Server” and press Enter
Select English
Select your location – In my case the United Kingdom
On the “Configure the keyboard” screen select “Yes” and press Enter. Once it has taken you through the configuration, you will see the page below
It will run through the below screen
Enter a hostname
Ubuntu prompts you to create a user account to be used instead of a root account. Start by entering the full name (first and last) of the user and Press Enter
Enter a username for the account
Choose a password
Enter the password again
I chose not to encrypt my home directory, so select “No” and press Enter
Configure the clock
Select “Guided – use entire disk and set up LVM” and press Enter
Choose the disk to partition. I only have one
Select “Yes” and press enter to write the changes to disk and configure LVM.
Accept the default amount of the volume group that will be used for guided partitioning. This tells the installer to use the full disk and press Enter
Select Yes and press enter to write the changes to disk
It will proceed to install the system
The install will begin and at some point it will prompt you to enter your Internet proxy. I don’t use one, so I left it blank and pressed Enter
A dialog will be presented asking you how you want to manage system upgrades. In this case I’ll manually apply updates, so I selected “No automatic updates” and pressed Enter
On the Software selection I have just chosen OpenSSH and PostGres as I am installing a VMware UMDS server
I clicked Yes to the Grub boot loader message
Finish the Installation
You are now ready to use the Ubuntu VM
Update VMware Tools
VMware Tools is a group of utilities and drivers that enhance the performance of the virtual machine’s guest operating system when running on an ESXi host. The steps below walk you through installing VMware Tools on our Ubuntu Server 14.04.06 virtual machine using the command line. Note that whenever you update the Linux kernel you will have to reinstall VMware Tools.
Launch a Web browser and login to the vSphere Web Client.
From the vCenter Home page click on “VMs and Templates.”
Right-click the VM and navigate to “All vCenter Actions” > “Guest OS” > “Install VMware Tools.”
When prompted click “Mount” to mount the VMware Tools installation disk image on the virtual CD/DVD drive of the Ubuntu Server virtual machine.
Right-click the VM again and select “Open Console.”
Login with the credentials used during the server installation process
Mount the VMware Tools CD image to /media/cdrom:
$ sudo mount /dev/cdrom /media/cdrom mount
/dev/sr0 is write-protected, mounting read-only
Extract the VMware Tools installer archive file to /tmp
$ tar xzvf /media/cdrom/VMwareTools-*.tar.gz -C /tmp/
Install VMware Tools by running the command below. Note that the -d switch assumes that you want to accept the defaults. If you don’t use -d switch you can opt to choose the default or a custom setting for each question.
$ cd /tmp/vmware-tools-distrib/
$ sudo ./vmware-install.pl -d ... The configuration of VMware Tools 9.4.5 build-1598834 for Linux for this running kernel completed successfully. ...
Reboot the virtual machine after the installation completes:
$ sudo reboot
Preparing the VM for UMDS
The next step is to prepare the VM for UMDS and then install it.
The following pre-requisite components for Linux are required but read on..
When you install UMDS manually, you are prompted for several responses and the script currently just uses those defaults. If you wish to change them, you simply just need to edit the “answer” file that the script generates to provide to the UMDS installer itself.
Here is what the script is doing at a high level:
Extract the UMDS installer into /tmp
Install all OS package dependencies
Create UMDS installer answer file /tmp/answer
Create the /etc/odbc.ini and /etc/odbcinst.ini configuration file
Updating pg_hba.conf to allow UMDS user to access the DB
Start Postgres DB
Create UMDSDB user and setting the assigned password
Install UMDS
Procedure
Upload both the UMDS install script (install_umds65.sh) as well as the UMDS install package found in the VCSA 6.5 ISO to an already deployed Ubuntu system
The script needs to run as root and it requires the following 5 command-line options:
UMDS package installer
Name of the UMDS Database
Name of the UMDS DSN Entry
Username for running the UMDS service
Password for the UMDS username
Running the script
I found I had to adjust the permissions on the script which I did via WinSCP first
It will start to install and this is what you will see
Extracting VMware-UMDS-6.7.0-14203538.tar.gz to /tmp …
vmware-umds-distrib/
vmware-umds-distrib/bin/
vmware-umds-distrib/bin/7z
vmware-umds-distrib/bin/vciInstallUtils
vmware-umds-distrib/bin/vmware-umds
vmware-umds-distrib/bin/downloadConfig.xml
vmware-umds-distrib/bin/vciInstallUtils_config.xml
vmware-umds-distrib/bin/unzip
vmware-umds-distrib/bin/zip
vmware-umds-distrib/bin/umds
vmware-umds-distrib/bin/vmware-updatemgr-wrapper
vmware-umds-distrib/bin/vmware-vciInstallUtils
vmware-umds-distrib/EULA
vmware-umds-distrib/share/
vmware-umds-distrib/share/VCI_proc_postgresql-100-110.sql
vmware-umds-distrib/share/VCI_proc_postgresql-110-120.sql
vmware-umds-distrib/share/VCI_data_postgresql-100-110.sql
vmware-umds-distrib/share/VCI_table_postgresql-110-120.sql
vmware-umds-distrib/share/VCI_base_postgresql.sql
vmware-umds-distrib/share/VCI_undo_postgresql.sql
vmware-umds-distrib/share/VCI_initialsetup_postgresql.sql
vmware-umds-distrib/share/VCI_table_postgresql-100-110.sql
vmware-umds-distrib/share/VCI_data_postgresql-110-120.sql
vmware-umds-distrib/share/VCI_proc_postgresql.sql
vmware-umds-distrib/lib/
vmware-umds-distrib/lib/libvim-types.so
vmware-umds-distrib/lib/libboost_program_options-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libboost_serialization-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libvci-types.so
vmware-umds-distrib/lib/libssl.so.1.0.2
vmware-umds-distrib/lib/libvmacore.so
vmware-umds-distrib/lib/libvci-registrar.so
vmware-umds-distrib/lib/libboost_thread-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libvci-vcIntegrity.so
vmware-umds-distrib/lib/libufa-types.so
vmware-umds-distrib/lib/libssoclient.so
vmware-umds-distrib/lib/libvsanmgmt-types.so
vmware-umds-distrib/lib/liblog4cpp.so.4
vmware-umds-distrib/lib/libstdc++.so.6
vmware-umds-distrib/lib/libboost_filesystem-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libcares.so.2
vmware-umds-distrib/lib/libodbc.so.2
vmware-umds-distrib/lib/libufa-common.so
vmware-umds-distrib/lib/libcurl.so.4
vmware-umds-distrib/lib/libvmomi.so
vmware-umds-distrib/lib/libexpat.so
vmware-umds-distrib/lib/libboost_system-gcc48-mt-1_61.so.1.61.0
vmware-umds-distrib/lib/libnfclib.so
vmware-umds-distrib/lib/libcrypto.so.1.0.2
vmware-umds-distrib/lib/libz.so.1
vmware-umds-distrib/vmware-install.pl
Installing UMDS package dependencies …
Hit http://security.ubuntu.com trusty-security InRelease
Ign http://gb.archive.ubuntu.com trusty InRelease
Hit http://gb.archive.ubuntu.com trusty-updates InRelease
Hit http://gb.archive.ubuntu.com trusty-backports InRelease
Hit http://gb.archive.ubuntu.com trusty Release.gpg
Hit http://gb.archive.ubuntu.com trusty Release
Hit http://security.ubuntu.com trusty-security/main Sources
Hit http://gb.archive.ubuntu.com trusty-updates/main Sources
Hit http://security.ubuntu.com trusty-security/restricted Sources
Hit http://gb.archive.ubuntu.com trusty-updates/restricted Sources
Hit http://security.ubuntu.com trusty-security/universe Sources
Hit http://gb.archive.ubuntu.com trusty-updates/universe Sources
Hit http://security.ubuntu.com trusty-security/multiverse Sources
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse Sources
Hit http://security.ubuntu.com trusty-security/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/restricted amd64 Packages
Hit http://security.ubuntu.com trusty-security/restricted amd64 Packages
Hit http://security.ubuntu.com trusty-security/universe amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/universe amd64 Packages
Hit http://security.ubuntu.com trusty-security/multiverse amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse amd64 Packages
Hit http://security.ubuntu.com trusty-security/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/restricted i386 Packages
Hit http://security.ubuntu.com trusty-security/restricted i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/universe i386 Packages
Hit http://security.ubuntu.com trusty-security/universe i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse i386 Packages
Hit http://security.ubuntu.com trusty-security/multiverse i386 Packages
Hit http://gb.archive.ubuntu.com trusty-updates/main Translation-en
Hit http://security.ubuntu.com trusty-security/main Translation-en
Hit http://security.ubuntu.com trusty-security/multiverse Translation-en
Hit http://security.ubuntu.com trusty-security/restricted Translation-en
Hit http://security.ubuntu.com trusty-security/universe Translation-en
Hit http://gb.archive.ubuntu.com trusty-updates/multiverse Translation-en
Hit http://gb.archive.ubuntu.com trusty-updates/restricted Translation-en
Hit http://gb.archive.ubuntu.com trusty-updates/universe Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/main Sources
Hit http://gb.archive.ubuntu.com trusty-backports/restricted Sources
Hit http://gb.archive.ubuntu.com trusty-backports/universe Sources
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse Sources
Hit http://gb.archive.ubuntu.com trusty-backports/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/restricted amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/universe amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse amd64 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/restricted i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/universe i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse i386 Packages
Hit http://gb.archive.ubuntu.com trusty-backports/main Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/multiverse Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/restricted Translation-en
Hit http://gb.archive.ubuntu.com trusty-backports/universe Translation-en
Hit http://gb.archive.ubuntu.com trusty/main Sources
Hit http://gb.archive.ubuntu.com trusty/restricted Sources
Hit http://gb.archive.ubuntu.com trusty/universe Sources
Hit http://gb.archive.ubuntu.com trusty/multiverse Sources
Hit http://gb.archive.ubuntu.com trusty/main amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/restricted amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/universe amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/multiverse amd64 Packages
Hit http://gb.archive.ubuntu.com trusty/main i386 Packages
Hit http://gb.archive.ubuntu.com trusty/restricted i386 Packages
Hit http://gb.archive.ubuntu.com trusty/universe i386 Packages
Hit http://gb.archive.ubuntu.com trusty/multiverse i386 Packages
Hit http://gb.archive.ubuntu.com trusty/main Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/main Translation-en
Hit http://gb.archive.ubuntu.com trusty/multiverse Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/multiverse Translation-en
Hit http://gb.archive.ubuntu.com trusty/restricted Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/restricted Translation-en
Hit http://gb.archive.ubuntu.com trusty/universe Translation-en_GB
Hit http://gb.archive.ubuntu.com trusty/universe Translation-en
Reading package lists… Done
Reading package lists… Done
Building dependency tree
Reading state information… Done
psmisc is already the newest version.
sed is already the newest version.
perl is already the newest version.
postgresql is already the newest version.
postgresql-contrib is already the newest version.
tar is already the newest version.
vim is already the newest version.
The following extra packages will be installed:
libltdl7 libodbc1 odbcinst odbcinst1debian2
Suggested packages:
libmyodbc tdsodbc unixodbc-bin
The following NEW packages will be installed
libltdl7 libodbc1 odbc-postgresql odbcinst odbcinst1debian2 unixodbc
0 to upgrade, 6 to newly install, 0 to remove and 32 not to upgrade.
Need to get 791 kB of archives.
After this operation, 2,607 kB of additional disk space will be used.
Get:1 http://gb.archive.ubuntu.com/ubuntu/ trusty/main libltdl7 amd64 2.4.2-1.7ubuntu1 [35.0 kB]
Get:2 http://gb.archive.ubuntu.com/ubuntu/ trusty/main libodbc1 amd64 2.2.14p2-5ubuntu5 [175 kB]
Get:3 http://gb.archive.ubuntu.com/ubuntu/ trusty/main odbcinst amd64 2.2.14p2-5ubuntu5 [12.6 kB]
Get:4 http://gb.archive.ubuntu.com/ubuntu/ trusty/main odbcinst1debian2 amd64 2.2.14p2-5ubuntu5 [40.6 kB]
Get:5 http://gb.archive.ubuntu.com/ubuntu/ trusty/universe odbc-postgresql amd64 1:09.02.0100-2ubuntu1 [507 kB]
Get:6 http://gb.archive.ubuntu.com/ubuntu/ trusty/main unixodbc amd64 2.2.14p2-5ubuntu5 [19.8 kB]
Fetched 791 kB in 0s (3,669 kB/s)
Selecting previously unselected package libltdl7:amd64.
(Reading database … 61291 files and directories currently installed.)
Preparing to unpack …/libltdl7_2.4.2-1.7ubuntu1_amd64.deb …
Unpacking libltdl7:amd64 (2.4.2-1.7ubuntu1) …
Selecting previously unselected package libodbc1:amd64.
Preparing to unpack …/libodbc1_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking libodbc1:amd64 (2.2.14p2-5ubuntu5) …
Selecting previously unselected package odbcinst.
Preparing to unpack …/odbcinst_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking odbcinst (2.2.14p2-5ubuntu5) …
Selecting previously unselected package odbcinst1debian2:amd64.
Preparing to unpack …/odbcinst1debian2_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking odbcinst1debian2:amd64 (2.2.14p2-5ubuntu5) …
Selecting previously unselected package odbc-postgresql:amd64.
Preparing to unpack …/odbc-postgresql_1%3a09.02.0100-2ubuntu1_amd64.deb …
Unpacking odbc-postgresql:amd64 (1:09.02.0100-2ubuntu1) …
Selecting previously unselected package unixodbc.
Preparing to unpack …/unixodbc_2.2.14p2-5ubuntu5_amd64.deb …
Unpacking unixodbc (2.2.14p2-5ubuntu5) …
Processing triggers for man-db (2.6.7.1-1ubuntu1) …
Setting up libltdl7:amd64 (2.4.2-1.7ubuntu1) …
Setting up libodbc1:amd64 (2.2.14p2-5ubuntu5) …
Setting up odbcinst (2.2.14p2-5ubuntu5) …
Setting up odbcinst1debian2:amd64 (2.2.14p2-5ubuntu5) …
Setting up odbc-postgresql:amd64 (1:09.02.0100-2ubuntu1) …
odbcinst: Driver installed. Usage count increased to 1.
Target directory is /etc
odbcinst: Driver installed. Usage count increased to 1.
Target directory is /etc
Setting up unixodbc (2.2.14p2-5ubuntu5) …
Processing triggers for libc-bin (2.19-0ubuntu6.14) …
Creating UMDS Installer answer file …
Creating /etc/odbc.ini …
Updating /etc/odbcinst.ini …
Updating pg_hba.conf …
Symlink /var/run/postgresql/.s.PGSQL.5432 /tmp/.s.PGSQL.5432 …
Starting Postgres …
Starting PostgreSQL 9.3 database server [ OK ]
Sleeping for 60 seconds for Postgres DB to be ready …
Creating UMDS DB + User …
SELECT pg_catalog.set_config(‘search_path’, ”, false)
CREATE ROLE umdsuser NOSUPERUSER CREATEDB CREATEROLE INHERIT LOGIN;
ALTER ROLE
Install VUM UMDS …
Installing VMware Update Manager Download Service.
Logs would be store at /var/log/vmware/vmware-updatemgr/umds
Creating the log directory if required….
In which directory do you want to install Download service?
[/usr/local/vmware-umds]
The path “/usr/local/vmware-umds” does not exist currently. This program is
going to create it, including needed parent directories. Is this what you want?
Let us setup some things for you…
Do you need proxy to connect to internet? [no]
One more thing…we need a storage location to store patches. Make sure you
have enough space in that location
Where do you want download service to store patches
[/var/lib/vmware-umds]
The path “/var/lib/vmware-umds” does not exist currently. This program is going
to create it, including needed parent directories. Is this what you want?
The installation of VMware Update Manager Download Service 6.7.0 build-14203538
completed successfully. You can decide to remove this software from your system
at any time by invoking the following command:
“/usr/local/vmware-umds/vmware-uninstall-umds.pl”.
Enjoy,
–the VMware team
Post script install
Once the UMDS installer script completes, you can verify by running the following two commands which provides you with the version of UMDS as well as the current configurations:
/usr/local/vmware-umds/bin/vmware-umds -v
/usr/local/vmware-umds/bin/vmware-umds -G
More setup Commands
Log in to the machine where UMDS is installed, and open a Command Prompt window.
The default location in 64-bit Linux is /usr/local/vmware-umds.
Enabling ESXi Updated and VA Updates
To set up a download of all ESXi host updates and all virtual appliance upgrades, run the following command:
vmware-umds -S –enable-host –enable-va
To set up a download of all ESXi host updates and disable the download of virtual appliance upgrades, run the following command:
vmware-umds -S –enable-host –disable-va
To set up a download of all virtual appliance upgrades and disable the download of host updates, run the following command:
vmware-umds -S –disable-host –enable-va
Changing the Patch Store folder
The default folder to which UMDS downloads patch binaries and patch metadata on a Linux machine is /var/lib/vmware-umds
This command downloads all the upgrades, patches and notifications from the configured sources for the first time. Subsequently, it downloads all new patches and notifications released after the previous UMDS download.
vmware-umds -D
You should now see the below when it finishes
Making the Content available via a Web Server
You have now successfully installed UMDS. Once you have download all of your content, you will need to setup an HTTP server to make it available to VUM instance in the vCenter Server Appliance. You can configure any popular HTTP Server such as Nginx or Apache. For my lab, I used the tiny HTTP server that Python provides.
To make the content under /var/lib/vmware-umds available, just change into that directory and run the following command:
python -m SimpleHTTPServer
Then if you navigate to a browser and type and http://192.168.1.69:8000, you should see your files
You can now add this URL into your vCenter Update Manager download settings
Thanks
Thanks to William Lam who’s blog I followed to set this up.
Curl is available in the VMware vCenter Server Appliance command line interface. This small blog provides a simple example of using Curl to simulate a telnet connection to test port connectivity
To test port connectivity in VMware vCenter Server Appliance:
Log in as root user through the VMware vCenter Server Appliance console.
Run this command on the vCenter Server Appliance:
curl -v telnet://target ip address:port number
Example of testing port connectivity
All vCenter servers must have access to the UMDS server on port 80 (http)
The below screen-print shows a working curl test from a vCenter to a Windows UMDS Server on IP address 10.124.74.65 over port 80.
Using Netcat to test port connectivity from hosts
The telnet command is not available in any versions of ESXi and, therefore, you must use netcat (nc) to confirm connectivity to a TCP port on a remote host. The syntax of the nc command is:
Within AutoDeploy, we sometimes need to update our base ESXi image and this blog will go through the process to do this. We use the HPE Custom Image for VMware ESXi 6.5 U2 Offline Bundle currently but what if we want to add a security patch?
Steps
a) Download the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-Apr2019-depot from myvmware.com
b) Click the icon to add a new Software depot and add a name
We now see our Software Depot named VMware ESXi 6.5U2 including Patches
Click the green up arrow to upload the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-April2019-depot into the Software Depots within AutoDeploy.
There are filters which allow you to select the type of update and severity including information about the patch
We will download the latest critical security patch
It downloads as a zip file
Upload this file into AutoDeploy. On the Software Depots tab and click the green up arrow to upload the patch zip file
f) We are now going to clone the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-Apr2019-depot
Click on the VMware-ESXi-6.5.0-Update2-10719125-HPE-Gen9plus-650.U2.10.4.0.29-Apr2019-depot. Under Image Profiles select the vendor image and click Clone. We are cloning the vendor image to replace the updated VIBs.
Fill in the Name, Vendor and description. Choose your newly created software depot
Choose Partner Supported from the drop-down
g) Leave this box for a minute as we need to check the bulletins associated with the security patch we downloaded – Link below for reference
What we see in this bulletin is the vibs which are updated
h) Use the search function in the clone wizard to find each of the updated VIBs. Un-select the existing version and select the new version to add it to the build. In the example below I have unticked the older version and ticked the newer version
Do the same for the other 3 affected VIBs. Uncheck the older one and tick the newer one
Check the final screen and click Finish
You should now be able to click on your software depot – VMware ESXi 6.5U2 including patches and see the Cloned Image Profile which contains the security patch
i) Now we can add our patched Image Profile into an AutoDeploy Rule
I’m not going to go through the whole process of creating a rule but as you can see below, I can now edit the deploy rule (must be deactivated to edit)
You can then select the software depot which will contain the patched ESXi image with the security patch
j) If you are updating an existing Deploy Rule then you will need to use PowerCLI to connect to the vCenter and run the below command to refresh the Autodeploy cache before rebooting a host and testing the image applies correctly
You can either do a single command on a host you want to test or run a command which updates all the hosts at once. In order to repair a single host to do a test we can use the below piped command. If you get an empty string back then the cache is correct and ready to use the new image
This error comes up on a 6.5U2 environment running on HP Proliant BL460c Gen9 Servers. Where it previously had fully compliant hosts, now when checking compliance of a host against a host profile, all you see is Host Compliance Unknown.
The Resolution
I’m still not entirely sure how we went from a state of Compliant hosts to Unknown but this is how we resolved it
Deleted the AutoDeploy Rule
Recreated the AutoDeploy Rule
Ran the below command. Anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location; you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts. This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet or running the single PowerCLI command below
foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}
Checked the status of the atftpd service in vCenter. Note: We are using the inbuilt TFTP server in vCenter however this is now not supported by VMware but we find it works just fine. Solarwinds is a good alternative.
service atftpd status
service atftpd start
Rebooted one host while checking the startup in the ILO console.
You may need to remediate your hosts after boot-up or if everything is absolutely configured correctly, it will simply boot up, add itself into the cluster and become compliant
vSphere Auto Deploy lets you provision hundreds of physical hosts with ESXi software.
Using Auto Deploy, administrators can manage large deployments efficiently. Hosts are network-booted from a central Auto Deploy server. Optionally, hosts are configured with a host profile of a reference host. The host profile can be set up to prompt the user for input. After boot up and configuration complete, the hosts are managed by vCenter Server just like other ESXi hosts.
Types of AutoDeploy Install
Auto Deploy can also be used for Stateless caching or Stateful install. There are several more options than there were before which are shown below in a screen-print from a host profile.
What is stateless caching?
Stateless caching addresses this by caching the ESXi image on the host’s local storage. If AutoDeploy is unavailable then the host will boot from its local cached image. There are a few things that need to be in place before stateless caching can be enabled:
Hosts should be set to boot from network first, and local disk second
Ensure that there is a disk with at least 1 GB available
The host should be set up to get it’s settings from a Host Profile as part of the AutoDeploy rule set.
To configure a host to use stateless caching, the host profile that it will receive needs to be updated with the relevant settings. To do so, edit the host profile, and navigate to the ‘System Image Cache Profile Settings’ section, and change the drop-down menu to ‘Enable stateless caching on the host’
Stateless caching can be seen in the below diagram
What is Stateful Caching?
It is also possible to have AutoDeploy install ESXi. When the host first boots it will pull the image from the AutoDeploy server, then on all subsequent restarts the host will boot from the locally installed image, just as with a manually built host. With stateful installs, ensure that the host is set to boot from disk firstly, followed by network boot.
AutoDeploy stateful installs are configured in the same way as stateless caching. Edit the host profile, this time changing the option to ‘Enable stateful installs on the host’:
AutoDeploy Architecture
Pre-requisites
A vSphere Auto Deploy infrastructure will contain the below components
vSphere vCenter Server – vSphere 6.7U1 is the best and most comprehensive option to date.
A DHCP server to assign IP addresses and TFTP details to hosts on boot up – Windows Server DHCP will do just fine.
A TFTP server to serve the iPXE boot loader
An ESXi offline bundle image – Download from my.vmware.com.
A host profile to configure and customize provisioned hosts – Use the vSphere Web Client.
ESXi hosts with PXE enabled network cards
1.VMware AutoDeploy Server
Serves images and host profiles to ESXi hosts.
vSphere Auto Deploy rules engine
Sends information to the vSphere Auto Deploy server which image profile and which host profile to serve to which host. Administrators use vSphere Auto Deploy to define the rules that assign image profiles and host profiles to host
2. Image Profile Server
Define the set of VIBs to boot ESXi hosts with.
VMware and VMware partners make image profiles and VIBs available in public depots. Use vSphere ESXi Image Builder to examine the depot and use the vSphere Auto Deploy rules engine to specify which image profile to assign to which host.
VMware customers can create a custom image profile based on the public image profiles and VIBs in the depot and apply that image profile to the host
3. Host Profiles
Define machine-specific configuration such as networking or storage setup. Use the host profile UI to create host profiles. You can create a host profile for a reference host and apply that host profile to other hosts in your environment for a consistent configuration
4. Host customization
Stores information that the user provides when host profiles are applied to the host. Host customization might contain an IP address or other information that the user supplied for that host. For more information about host customizations, see the vSphere Host Profiles documentation.
Host customization was called answer file in earlier releases of vSphere Auto Deploy
5. Rules and Rule Sets
Rules
Rules can assign image profiles and host profiles to a set of hosts, or specify the location (folder or cluster) of a host on the target vCenter Server system. A rule can identify target hosts by boot MAC address, SMBIOS information, BIOS UUID, Vendor, Model, or fixed DHCP IP address. In most cases, rules apply to multiple hosts. You create rules by using the vSphere Client or vSphere Auto Deploy cmdlets in a PowerCLI session. After you create a rule, you must add it to a rule set. Only two rule sets, the active rule set and the working rule set, are supported. A rule can belong to both sets, the default, or only to the working rule set. After you add a rule to a rule set, you can no longer change the rule. Instead, you copy the rule and replace items or patterns in the copy. If you are managing vSphere Auto Deploy with the vSphere Client, you can edit a rule if it is in inactive state
You can specify the following parameters in a rule.
Active Rule Set
When a newly started host contacts the vSphere Auto Deploy server with a request for an image profile, the vSphere Auto Deploy server checks the active rule set for matching rules. The image profile, host profile, vCenter Server inventory location, and script object that are mapped by matching rules are then used to boot the host. If more than one item of the same type is mapped by the rules, the vSphere Auto Deploy server uses the item that is first in the rule set.
Working Rule Set
The working rule set allows you to test changes to rules before making the changes active. For example, you can use vSphere Auto Deploy cmdlets for testing compliance with the working rule set. The test verifies that hosts managed by a vCenter Server system are following the rules in the working rule set. By default, cmdlets add the rule to the working rule set and activate the rules. Use the NoActivate parameter to add a rule only to the working rule set.
You use the following workflow with rules and rule sets.
Make changes to the working rule set.
Test the working rule set rules against a host to make sure that everything is working correctly.
Refine and retest the rules in the working rule set.
Activate the rules in the working rule set.If you add a rule in a PowerCLI session and do not specify the NoActivate parameter, all rules that are currently in the working rule set are activated. You cannot activate individual rules
AutoDeploy Boot Process
The boot process is different for hosts that have not yet been provisioned with vSphere Auto Deploy (first boot) and for hosts that have been provisioned with vSphere Auto Deploy and added to a vCenter Server system (subsequent boot).
First Boot Prerequisites
Before a first boot process, you must set up your system. .
Set up a DHCP server that assigns an IP address to each host upon startup and that points the host to the TFTP server to download the iPXE boot loader from.
If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.
Identify an image profile to be used in one of the following ways.
Choose an ESXi image profile in a public depot.
Create a custom image profile by using vSphere ESXi Image Builder, and place the image profile in a depot that the vSphere Auto Deploy server can access. The image profile must include a base ESXi VIB.
If you have a reference host in your environment, export the host profile of the reference host and define a rule that applies the host profile to one or more hosts.
Specify rules for the deployment of the host and add the rules to the active rule set.
First Boot Overview
When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.
When a host that has not yet been provisioned with vSphere Auto Deploy boots (first boot), the host interacts with several vSphere Auto Deploy components.
When the administrator turns on a host, the host starts a PXE boot sequence.The DHCP Server assigns an IP address to the host and instructs the host to contact the TFTP server.
The host contacts the TFTP server and downloads the iPXE file (executable boot loader) and an iPXE configuration file.
iPXE starts executing.The configuration file instructs the host to make a HTTP boot request to the vSphere Auto Deploy server. The HTTP request includes hardware and network information.
In response, the vSphere Auto Deploy server performs these tasks:
Queries the rules engine for information about the host.
Streams the components specified in the image profile, the optional host profile, and optional vCenter Server location information.
The host boots using the image profile.If the vSphere Auto Deploy server provided a host profile, the host profile is applied to the host.
vSphere Auto Deploy adds the host to thevCenter Server system that vSphere Auto Deploy is registered with.
If a rule specifies a target folder or cluster on the vCenter Server system, the host is placed in that folder or cluster. The target folder must be under a data center.
If no rule exists that specifies a vCenter Server inventory location, vSphere Auto Deploy adds the host to the first datacenter displayed in the vSphere Client UI.
If the host profile requires the user to specify certain information, such as a static IP address, the host is placed in maintenance mode when the host is added to the vCenter Server system.You must reapply the host profile and update the host customization to have the host exit maintenance mode. When you update the host customization, answer any questions when prompted.
If the host is part of a DRS cluster, virtual machines from other hosts might be migrated to the host after the host has successfully been added to the vCenter Server system.
Subsequent Boots Without Updates
For hosts that are provisioned with vSphere Auto Deploy and managed by avCenter Server system, subsequent boots can become completely automatic.
The administrator reboots the host.
As the host boots up, vSphere Auto Deploy provisions the host with its image profile and host profile.
Virtual machines are brought up or migrated to the host based on the settings of the host.
Standalone host. Virtual machines are powered on according to autostart rules defined on the host.
DRS cluster host. Virtual machines that were successfully migrated to other hosts stay there. Virtual machines for which no host had enough resources are registered to the rebooted host.
If the vCenter Server system is unavailable, the host contacts the vSphere Auto Deploy server and is provisioned with an image profile. The host continues to contact the vSphere Auto Deploy server until vSphere Auto Deploy reconnects to the vCenter Server system.
vSphere Auto Deploy cannot set up vSphere distributed switches if vCenter Server is unavailable, and virtual machines are assigned to hosts only if they participate in an HA cluster. Until the host is reconnected to vCenter Server and the host profile is applied, the switch cannot be created. Because the host is in maintenance mode, virtual machines cannot start.
Important: Any hosts that are set up to require user input are placed in maintenance mode
Subsequent Boots With Updates
You can change the image profile, host profile, vCenter Server location, or script bundle for hosts. The process includes changing rules and testing and repairing the host’s rule compliance.
The administrator uses the Copy-DeployRule PowerCLI cmdlet to copy and edit one or more rules and updates the rule set. .
The administrator runs the Test-DeployRulesetCompliance cmdlet to check whether each host is using the information that the current rule set specifies.
The host returns a PowerCLI object that encapsulates compliance information.
The administrator runs the Repair-DeployRulesetCompliance cmdlet to update the image profile, host profile, or vCenter Server location the vCenter Server system stores for each host.
When the host reboots, it uses the updated image profile, host profile, vCenter Server location, or script bundle for the host.If the host profile is set up to request user input, the host is placed in maintenance mode
Note: Do not change the boot configuration parameters to avoid problems with your distributed switch
Prepare your system for AutoDeploy
Before you can PXE boot an ESXi host with vSphere Auto Deploy, you must install prerequisite software and set up the DHCP and TFTP servers that vSphere Auto Deploy interacts with.
Prerequisites
Verify that the hosts that you plan to provision with vSphere Auto Deploy meet the hardware requirements for ESXi. See ESXi Hardware Requirements.
Verify that the ESXi hosts have network connectivity to vCenter Server and that all port requirements are met. See vCenter Server Upgrade.
Verify that you have a TFTP server and a DHCP server in your environment to send files and assign network addresses to the ESXi hosts that Auto Deploy provisions.
Verify that the ESXi hosts have network connectivity to DHCP, TFTP, and vSphere Auto Deploy servers.
If you want to use VLANs in your vSphere Auto Deploy environment, you must set up the end to end networking properly. When the host is PXE booting, the firmware driver must be set up to tag the frames with proper VLAN IDs. You must do this set up manually by making the correct changes in the UEFI/BIOS interface. You must also correctly configure the ESXi port groups with the correct VLAN IDs. Ask your network administrator how VLAN IDs are used in your environment.
Verify that you have enough storage for the vSphere Auto Deploy repository. The vSphere Auto Deploy server uses the repository to store data it needs, including the rules and rule sets you create and the VIBs and image profiles that you specify in your rules.Best practice is to allocate 2 GB to have enough room for four image profiles and some extra space. Each image profile requires approximately 350 MB. Determine how much space to reserve for the vSphere Auto Deploy repository by considering how many image profiles you expect to use.
Obtain administrative privileges to the DHCP server that manages the network segment you want to boot from. You can use a DHCP server already in your environment, or install a DHCP server. For your vSphere Auto Deploy setup, replace the gpxelinux.0 filename with snponly64.efi.vmw-hardwired for UEFI or undionly.kpxe.vmw-hardwired for BIOS.
Secure your network as you would for any other PXE-based deployment method. vSphere Auto Deploy transfers data over SSL to prevent casual interference and snooping. However, the authenticity of the client or the vSphere Auto Deploy server is not checked during a PXE boot.
If you want to manage vSphere Auto Deploy with PowerCLI cmdlets, verify that Microsoft .NET Framework 4.5 or 4.5.x and Windows PowerShell 3.0 or 4.0 are installed on a Windows machine. You can install PowerCLI on the Windows system on which vCenter Server is installed or on a different Windows system. See the vSphere PowerCLI User’s Guide.
Set up a remote Syslog server. See the vCenter Server and Host Management documentation for Syslog server configuration information. Configure the first host you boot to use the remote Syslog server and apply that host’s host profile to all other target hosts. Optionally, install and use the vSphere Syslog Collector, a vCenter Server support tool that provides a unified architecture for system logging and enables network logging and combining of logs from multiple hosts.
Install ESXi Dump Collector, set up your first host so that all core dumps are directed to ESXi Dump Collector, and apply the host profile from that host to all other hosts.
If the hosts that you plan to provision with vSphere Auto Deploy are with legacy BIOS, verify that the vSphere Auto Deploy server has an IPv4 address. PXE booting with legacy BIOS firmware is possible only over IPv4. PXE booting with UEFI firmware is possible with either IPv4 or IPv6.
Starting to configure AutoDeploy
Step 1 – Enable the AutoDeploy, Image Builder Service and Dump Collector Service
Install vCenter Server or deploy the vCenter Server Appliance.The vSphere Auto Deploy server is included with the management node.
Configure the vSphere Auto Deploy service startup type.
On the vSphere Web Client Home page, click Administration.
Under System Configuration, click Services
Select Auto Deploy, click the Actions menu, and select Edit Startup Type and select Automatic
(Optional) If you want to manage vSphere Auto Deploy with the vSphere Web Client, configure the vSphere ESXi Image Builder service startup type
Check the Startup
Log out of the vSphere Web Client and log in again.The Auto Deploy icon is visible on the Home page of the vSphere Web Client
Enable the Dump Collector
You can now either set the dump collector manually on each host or configure the host profile with the settings
If you want to enter it manually and point the dump collector to the vCenter then the following commands are used
esxcli system coredump network set –interface-name vmk0 –server-ipv4 10.242.217.11 –server-port 6500
esxcli system coredump network set –enable true
Enable Automatic Startup
Step 2 Configure the TFTP server
There are different options here. Some people use Solarwinds or there is the option now to use an inbuilt TFTP service on the vCenter
Important: The TFTP service in vCenter is only supported for dev and test environments, not production and will be coming out of future releases of vCenter. It is best to have a separate TFTP server.
Instructions
Now that Auto Deploy is enabled we can configure the TFTP server. Enable SSH on the VCSA by browsing to the Appliance Management page: https://VCSA:5480 where VCSA is the IP or FQDN of your appliance.
Log in as the root account. From the Access page enable SSH Login and Bash Shell.
SSH onto the vCenter Appliance, using a client such as Putty, and log in with the root account. First type shell and hit enter to launch Bash.
To start the TFTP service enter service atftpd start
Check the service is started using service atftpd status
To allow TFTP traffic through the firewall on port 69; we must run the following command. (Note double dashes in front of dport)
Validate traffic is being accepted over port 69 using the following command
iptables -nL | grep 69
iptables can be found in /etc/systemd/scripts just for reference
Validate traffic is being accepted over port 69 using iptables -nL | grep 69
Type chkconfig atftpd on
To make the iptables rules persistent is to load them after a reboot from a script.
Save the current active rules to a file
iptables-save > /etc/iptables.rules
Next create the below script and call it starttftp.sh
#! /bin/sh # # TFTP Start/Stop the TFTP service and allow port 69 # # chkconfig: 345 80 05 # description: atftpd ### BEGIN INIT INFO # Provides: atftpd # Required-Start: $local_fs $remote_fs $network # Required-Stop: # Default-Start: 3 5 # Default-Stop: 0 1 2 6 # Description: TFTP ### END INIT INFO service atftpd start iptables-restore -c < /etc/iptables.rules
Put the starttftp.sh script in /etc/init.d via WinSCP
Put full permissions on the script
This should execute the command and reload the firewall tables after the system is rebooted
Reboot the vCenter appliance to test the script is running. If successful the atftpd service will be started and port 69 allowed, you can check these with service atftpd status and iptables -nL | grep 69.
Your TFTP directory is located at /tftpboot/
The TRAMP file on the vCenter must also now be modified and the DNS name removed and replaced with the IP address of the vCenter. Auto Deploy will not work without doing this part
The directory already contains the necessary files for Auto Deploy (tramp file, undionly.kpxe.vmw-hardwired, etc) Normally if you use Solarwinds TFTP server, you would need to download the TFTP Boot Zip and extract the files into the TFTP Root folder
Note there may be an issue with downloading this file due to security restrictions being enabled by some of the well known browsers – This is the likely message seen below. You may have to modify a browser setting in order to access the file
If everything is ok then you’ll be able to download it but note again, you do not need to download this if you are using the inbuilt TFTP server in vCenter as the files are already there.
Step 3 – Setting up DHCP options
The DHCP server assigns an IP address to the ESXi host when the host boots. The DHCP server also provides two required options to point the host to the TFTP server and to the boot files necessary for vSphere Auto Deploy to work. These additional DHCP options are as follows:
066 – Boot Server Host Name – This option must
be enabled, and the IP address of the server running TFTP should be inserted
into the data entry field.
067 – Bootfile Name –The “BIOS DHCP File Name”
found in the vSphere Auto
Deploy settings of your vCenter Server must be used here. The file name is undionly.kpxe.vmw-hardwired.
Go to Server Options and click Configure Options
In the value for option 066 (next-server) enter the FQDN of the TFTP boot server. In my case my vCenter Server hosting the TFTP service
Select option 67 and type in undionly.kpxe.vmw-hardwired.The undionly.kpxe.vmw-hardwired iPXE binary will be used to boot the ESXi host
Note: if you were using UEFI, you would need to put snponly64.efi.vmw-hardwired
You should now see the two options in DHCP
Next we need to add a scope and reservations to this scope
Right click IPv4 and select New Scope
A wizard will pop up
Put in a name and description
Put in the network IP range and subnet mask for the scope. Note: I have 3 hosts for testing.
Ignore the next screen and click Next
Ignore the next screen and click Next
Click No to configure options afterwards
Click Finish
We now need to create a DHCP reservation for each target ESXi host
In the DHCP window, navigate to DHCP > hostname > IPv4 > Autodeploy Scope > Reservations.
Right-click Reservations and select New Reservation.
In the New Reservation window, specify a name, IP address, and the MAC address for one of the hosts. Do not include the colon (:) in the MAC address.
The initial installation and setup is now finished and we can now start with the next stage
Stage 4 Image Builder and AutoDeploy GUI
The next stage involves logging into myvmware.com and downloading an offline bundle of the version of ESXi you need
Go to Home > Autodeploy in vCenter and select Add a Software Depot
Click Software Depots and then click Import Software Depot and upload. 4 images are normally recommended space wise.
Once uploaded, click on the depot and you should see the below
And
If you click on an image, you get options above where you can clone or export to an iso for example
Stage 5 – Creating an Deploy Rule
A deploy rule gives you control over the deployment process since you can specify which image profile is rolled out and on which server. Once a rule is created, you can also Edit or Clone it. Once created, the rule has to be activated for it to apply. If rules are not activated, Auto Deploy will fail
Click on the Deploy Rules tab and add a name
Next we want to select hosts that match the following pattern. There are multiple options
Asset
Domain
Gateway IPv4
Hostname
IPv4
IPv6
MAC address
Model
OEM string
Serial number
UUID
Vendor
I am going to use an IP range of my 3 hosts which is 192.168.1.100-192.168.1.102
Next Select an Image Profile
Select the ESXi image to deploy to the hosts, change the software depot from the drop down menu if needed, then click Next. If you have any issues with vib signatures you can skip the signature checks using the tick box.
The main setting which must be chosen when creating a host profile to be used with AutoDeploy is how you want AutoDeploy to install an image as per below
Host Profile selection screen
Next Select a location
Next you will be on the Ready to Complete screen. Check the details and click Finish if you are happy
Note: The rule will be inactive – To use it, you will need to activate it but we will cover this in the next steps
The deploy rule is created but in an inactive state. Select the deploy rule and note the options; Activate / Deactivate, Clone, Edit, Delete. Click Activate / Deactivate, a new window will open. Select the newly created deploy rule and click Activate, Next, and Finish.
Now the deploy rule is activated; when you boot a host where the deploy rule is applicable you will see it load ESXi and the customization specified in the host profile. Deploy rules need to be deactivated before they can be edited.
You can setup multiple deploy rules using different images for different clusters or host variables. Hosts using an Auto Deploy ruleset are listed in the Deployed Hosts tab, hosts that didn’t match any deploy rules are listed under Discovered Hosts
Stage 6 – Reboot the ESXi host and see if the AutoDeploy deployment works as expected.
When you reboot a host, it will then come up as per the below screenprint
Once booted up, remediate the host
If you type in the following URL – https://<vCenter IP>:6502/vmw/rbd, it should take you to the Auto Deploy Debugging page where you can view registered hosts along with a detailed view of host and PXE information as well as the Auto Deploy Cache content
What do you do when you need to modify the Image Profile or Host Profile?
There are 2 commands you need to run to ensure the hosts can pick up the new data from the AutoDeploy rule whether it be a new image or a new/modified host profile. If you don’t run these, you will likely find that when you reboot your vSphere hosts they still boot from the old image”.
This situation occurs when you update the active ruleset without updating the corresponding host entries in the auto deploy cache. The first time a host boots the Auto Deploy server parses the host attributes against the active ruleset to determine (a) The image profile, (b) The host profile, and (c) the location of the host in the vCenter inventory. This information then gets saved in the auto deploy cache and reused on all future reboots. The strategy behind saving this information is to reduce the load on the auto deploy server by eliminating the need to parse each host against the rules engine on every reboot. With this approach each host only gets parsed against the active ruleset once (on the initial boot) after which the results get saved and reused on all subsequent reboots.
However, anytime you make a change to the active ruleset that results in a host using a different image profile or host profile or being assigned to a different vCenter location. When you make changes not only do you need to update the rules in the active ruleset but you also need to update the host entries saved in the cache for the affected hosts. This is done using the Test-DeployRuleSetCompliance cmdlet together with the Repair-DeployRuleSetCompliance cmdlet.
Use the “Test-DeployRuleSetCompliance” cmdlet to check if the host information saved on the auto deploy server is up-to-date. This cmdlet parses the host attributes against the active ruleset and compares the results with the information saved in the cache. If the saved information is incorrect (i.e. out of compliance) the cmdlet will return a status of “Non-Compliant” and show what needs to be updated. If the information in the cache is correct, then the command will simply return an empty string.
In order to check one host, we can use Test-DeployRuleSetCompliance lg-spsp-cex03.lseg.stockex.local. it will tell us it is non-compliant
In order to repair a single host to do a test we can use the below piped command. If you get an empty string back then the cache is not correct and ready to use the new image
However, if we want to be clever about this because we have a lot of hosts, then we can run a quick simple PowerCLI “foreach” loop so we don’t have to update one host at a time
foreach ($esx in get-vmhost) {$esx | test-deployrulesetcompliance | repair-deployrulesetcompliance}
At this point, I would now start the TFTP service on the vCenter. Note: If you are using Solarwinds, this not necessary! Unless you want to double check it is all ok first.
Next Reboot the hosts and check they come up as the right version, example of our environment below pre and post remediation
Other issues we faced!
Issue 1 – TFTP Service on the vCenter
We used the TFTP service which was inbuilt to the vCenter. What you will find if you use this is that it will start but then it will automatically stop itself after a while which is fine. It’s just a case of remembering to start it. I found that with our HPE hosts, even after modifying the AutoDeploy rule and running the TestDeploy and RepairDeploy rules, that it was still booting from cache. In the ILO screen, you could see it picking up a DHCP address and the DHCP service passing the TFTP server to the host but then it timed out, Once the service was started on the vCenter it was fine.
service atftpd start
service atftp status
Note: Apparently VMware do not support the inbuilt vCenter Service so when we asked how we could keep the service running, we were told they wouldn’t help with it. So probably best to install something like Solarwinds which will keep the service running continuously.
Issue 2 – HPE Oneview Setting for PXE Boot
We found that with HPE BL460 Blades with SSD cards in, sometimes an empty host would boot up and lock a partition. This resulted in the host profile not being able to be applied, settings all over the place and there was absolutely no way of getting round it. We could only resolve it by using gparted to wipe the disk and boot again. There seemed to be no logic though as 5 out of 10 fresh hosts would boot up fine and 5 would not and lock the partition.
This what you would see if you hover over the error in vCenter
Don't think about what can happen in a month. Don't think what can happen in a year. Just focus on the 24 hours in front of you and do what you can to get closer to where you want to be :-)