The BIOS (Basic Input Output Operating System) is the first piece of software which runs and carries out the following tasks.
Performing POST – (Power-On Self-Test) in this phase the BIOS checks if the components installed on the motherboard are functioning
Basic I/O checks -This checks the peripherals such as the keyboard, the monitor and serial ports can operate to perform basic tasks.
Booting – The BIOS tries to boot from the devices connected (SSDs, HDDs, PXE, whatever) in order to provide an Operating System) to operate the computer.
It can also be a low level management tool providing some ability to tweak system features and settings
What is UEFI?
UEFI stands for Unifed Extensible Firmware Interface. UEFI was released in 2007 to provide a successor to BIOS to overcome limitations. Before this computers used the BIOS (Basic Input Output Operating System). Most UEFI firmware implementations provide support for legacy BIOS services.
UEFI Advantages over BIOS
32-bit/64/bit architecture rather than 16-bit
CPU independent architecture
Ability to use large disk partitions over 2TB. UEFI’s theoretical size limit for bootable drives is more than nine zettabytes, while BIOS can only boot from drives 2TB or smaller.
Flexible pre-OS environment, including network capability, GUI, multi language
Expanded BIOS with a GUI and mouse ability
UEFI Secure Boot feature, which employs digital signatures to verify the integrity of low-level code like boot loaders and operating system files before execution. If validation fails, Secure Boot halts execution of the compromised bits to stop any potential attack in its tracks. Secure Boot was added in version 2.2 of the UEFI specification
UEFI does not use the Master Boot Record (MBR) scheme to store the low-level bits that bootstrap the operating system. Under the MBR, these key bits reside in the first segment of the disk, and any corruption or damage to that area stops the operating system from loading. Instead, UEFI uses the GUID Partition Table (GPT) scheme and stores initialization code in an .efi file found in a hidden partition. GPT also stores redundant copies of this code and uses cyclic redundancy checks to detect changes or corruption of the data
C / C++ language used instead of assembly language
When building Windows 10 or Windows Server 2016 VM’s, it is recommended you build them with EFI firmware enabled. Moving from traditional BIOS/MBR to EFI (UEFI) firmware afterwards introduces some challenges later on down the line and can cause machines not to boot.
UEFI still cannot be used for auto deploying vSphere ESXi hosts but this may change in the future.
An ESXi image (Download from myvmware.com)and use the depot zip
VMware PowerCLIand the ESXi Image Builder module
For more information on setting this up, see this blog. Thanks to Michelle Laverick.
Other software depots
The vSphere ESXi depot is the main software depot you will need but there are other depots provided by vendors who create collections of VIBs specially packaged for distribution. Depots can be Online and Offline. An online software depot is accessed remotely using the HTTP protocol. An offline software depot is downloaded and accessed locally. These depots have the vendor specific VIBs that you will need to combine with the vSphere ESXi depot in order to create your custom installation image. An example could be HP’s depot on this link
What are VIBS?
VIB actually stands for vSphere Installation Bundle. It is basically a collection of files packaged into a single archive to facilitate distribution. It is composed of 3 parts
A file archive (The files which will be installed on the host)
An xml descriptor file (Describes the contents of the VIB. It contains the requirements for installing the VIB and identifies who created the VIB and the amount of testing that’s been done including any dependencies, any compatibility issues, and whether the VIB can be installed without rebooting.)
A signature file (Verifies the acceptance level of the VIB) There are 4 acceptance levels. See next paragraph
Acceptance levels
Each VIB is released with an acceptance level that cannot be changed. The host acceptance level determines which VIBs can be installed to a host.
VMwareCertfied
The VMwareCertified acceptance level has the most stringent requirements. VIBs with this level go through thorough testing fully equivalent to VMware in-house Quality Assurance testing for the same technology. Today, only I/O Vendor Program (IOVP) program drivers are published at this level. VMware takes support calls for VIBs with this acceptance level.
VMwareAccepted
VIBs with this acceptance level go through verification testing, but the tests do not fully test every function of the software. The partner runs the tests and VMware verifies the result. Today, CIM providers and PSA plug-ins are among the VIBs published at this level. VMware directs support calls for VIBs with this acceptance level to the partner’s support organization.
PartnerSupported
VIBs with the PartnerSupported acceptance level are published by a partner that VMware trusts. The partner performs all testing. VMware does not verify the results. This level is used for a new or nonmainstream technology that partners want to enable for VMware systems. Today, driver VIB technologies such as Infiniband, ATAoE, and SSD are at this level with nonstandard hardware drivers. VMware directs support calls for VIBs with this acceptance level to the partner’s support organization.
CommunitySupported
The CommunitySupported acceptance level is for VIBs created by individuals or companies outside of VMware partner programs. VIBs at this level have not gone through any VMware-approved testing program and are not supported by VMware Technical Support or by a VMware partner.
Steps to create an custom ESXi image
I have an ESXI 7.0U1c software depot zip file and I am going to use an Intel VIB which I will add into the custom image
2. Open PowerCLI and connect to your vCenter
Connect-VIServer <vCenterServer>
3. Next I add my vSphere ESXi and Intel software depot zips
4. If you want to check what packages are available once the software depots have been added.
Get-EsxSoftwarePackage
5. Next we can check what image profiles are available. We are going to clone one of these profiles
Get-EsxImageProfile
6. There are two ways to create a new image profile, you can create an empty image profile and manually specify the VIBs you want to add, or you can clone an existing image profile and use that. I have cloned an existing image profile
If I do a Get-EsxImageProfile now, I can see the new image profile I created
7. Next, I’ll use the Add-EsxSoftwarePackage to add and remove VIBs to/from the image profile. First of all I’ll check my extra Intel package to get the driver name then I will add the software package
Get-EsxSoftwarePackage | where {$_.Vendor -eq “INT”}
9. Just as a note, If you need to change the acceptance level, then you can do so by running the following command before creating the iso or zip. The example below shows changing the imageprofile to the PartnerSupport acceptance level.
Cananonical and Windows have linked up to provide the ability to run Linux on Windows. Developers can also use Cygwin, MSYS, or run Linux in a virtual machine, but these workarounds have their own disadvantages and can overload systems. Bash on Windows provides a Windows subsystem and Ubuntu Linux runs on top of it.
Basically, Windows allows you to run the same Bash shell that you find on Linux. This way you can run Linux commands inside Windows without the needing to install a virtual machine, or dual booting Linux/Windows. You install Linux inside Windows like a regular application. This is a good option if you want to learn Linux/Unix commands.
How to enable
Go to Control Panel – Programs and Features – Turn Windows Features on and off.
Enable Windows Subsystem for Linux and Virtual Machine Platform
Reboot
Go to the Windows store and search for Linux or Ubuntu. Install the distribution you want. In my case Ubuntu.
Once Ubuntu has installed, you will need to set up a username and password
This occurs for the first run. Bash shell will be available to use the next time you log in.
When you open the Bash shell in Windows, you are literally running Ubuntu. Developers can now run Bash scripts, Linux command-line tools like sed, awk, grep, and Linux-first tools like Ruby, Git, Python, etc. directly on Windows.
Search for bash or wsl in the Windows search box
Almost all Linux commands can be used in the Bash shell on Windows
Opening bash and wsl will display as a “Run command” that can be selected to instantly open the bash shell. The difference with using either of these methods is that they open in the /mnt/c/Windows/System32 directory so you can browse the System32 subdirectory in Windows 10.
Or you can simply open the Ubuntu app
Examples
You can run sudo apt-get update and sudo apt-get upgrade to obtain and install updates along with all usual Linux commands.
It has become a requirement for companies to enable protection of both personal identifiable information and data; including protecting other communications within and across environments New EU General Data Protection Regulations (GDPR) are now a legal requirement for global companies to protect the personal identifiable information of all European Union residents. In the last year, the United Kingdom has left the EU, however the General Data Protection Regulations will still be important to implement. “The Payment Card Industry Data Security Standards (PCI DSS) requires encrypted card numbers. The Health Insurance Portability and Accountability Act and Health Information Technology for Economic and Clinical Health Acts (HIPAA/HITECH) require encryption of Electronic Protected Health Information (ePHI).” (Townsendsecurity, 2019) Little is known about the effect encryption has on the performance of different data held on virtual infrastructure. VM encryption and vSAN encryption are the two data protection options I will evaluate for a better understanding of the functionality and performance effect on software defined storage.
It may be important to understand encryption functionality in order to match business and legal requirements. Certain regulations may need to be met which only specific encryption solutions can provide. Additionally, encryption adds a layer of functionality which is known to have an effect on system performance. With systems which scale into thousands, it is critical to understand what effect encryption will have on functionality and performance in large environments. It will also help when purchasing hardware which has been designed for specific environments to allow some headroom in the specification for the overhead of encryption.
What will be used to test
Key IT Aspects
Description
VMware vSphere ESXi servers
8 x Dell R640 ESXi servers run the virtual lab environment and the software defined storage.
HCIBench test machines
80 x Linux Photon 1.0 virtual machines.
vSAN storage
Virtual datastore combining all 8 ESXi server local NVMe disks. The datastore uses RAID (Redundant array of inexpensive disks), a technique combining multiple disks together for data redundancy and performance.
Key Encryption Management Servers
Clustered and load balanced Thales key management servers for encryption key management.
Encryption Software
VM encryption and vSAN encryption
Benchmarking software
HCIBench v2.3.5 and Oracle Vdbench
Test lab hardware
8 servers
Architecture
Details
Server Model
Dell R640 1U rackmount
CPU Model
Intel Xeon Gold 6148
CPU count
2
Core count
20 per CPU
Processor AES-NI
Enabled in the BIOS
RAM
768GB (12 x 64GB LRDIMM)
NIC
Mellanox ConnectX-4 Lx Dual Port 25GbE rNDC
O/S Disk
1 x 240GB Solid State SATADOM
vSAN Data Disk
3 x 4TB U2 Intel P4510 NVMe
vSAN Cache Disk
1 x 350GB Intel Optane P4800X NVMe
Physical switch
Cisco Nexus N9K-C93180YC-EX
Physical switch ports
48 x 25GbE and 4 x 40GbE
Virtual switch type
VMware Virtual Distributed Switch
Virtual switch port types
Elastic
HCIBench Test VMs
80 HCIBench Test VMs will be used for this test. I have placed 10 VMs on each of the 8 Dell R640 servers to provide a balanced configuration. No virtual machines other than the HCIBench test VMs will be run on this system to avoid interference with the testing.
The specification of the 80 HCIBench Test VMs are as follows.
Resources
Details
CPU
4
RAM
8GB
O/S VMDK primary disk
16GB
Data VMDK disk
20GB
Network
25Gb/s
HCIBench Performance Metrics
Workload Parameter
Explanation
Value
IOPs
IOPS measures the number of read and write operations per second
Input/Outputs per second
Throughput
Throughput measures the number of bits read or written per second Average IO size x IOPS = Throughput in MB/s
MB/s
Read Latency
Latency is the response time when you send a small I/O to a storage device. If the I/O is a data read, latency is the time it takes for the data to come back
ms
Write Latency
Latency is the response time when you send a small I/O to a storage device. If the I/O is a write, latency is the time for the write acknowledgement to return.
ms
Latency Standard Deviation
Standard deviation is a measure of the amount of variation within a set of values. A low standard deviation indicates that the values tend to be close to the mean of the set, while a high standard deviation indicates that the values are spread out over a wider range
Values must be compared to the standard deviation
Average ESXi CPU usage
Average ESXi Host CPU usage
%
Average vSAN CPU usage
Average CPU use for vSAN traffic only
%
HCIBench Test Parameter Options
The HCIBench performance options allow you to set the block size and the types of read/write ratios. In these tests, I will be using the following block sizes to give a representation of the different types of applications you can see on corporate systems
4k
16k
64k
128k
In these tests I will be using the following Read/Write ratios to also give a representation of the different types of applications you can see on corporate systems
0% Read 100% Write
20% Read 80% Write
70% Read 30% Write
RAID Configuration
VM encryption will be tested on RAID1 and RAID6 vSAN storage
vSAN encryption will be tested on RAID1 and RAID6 vSAN storage
Note: vSAN encryption is not configured at all in the policy for vSAN encryption as this is turned on at the datastore level but we still need a generic RAID1 and RAID6 storage policy.
VM encryption RAID1 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid1_vsan_policy Storage Type = vSAN Failures to tolerate = 1 (RAID 1) Thin provisioned = Yes Number of disk stripes per object = 1 Encryption enabled = Yes Deduplication and Compression enabled = No
VM encryption RAID6 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid6_vsan_policy Storage Type = vSAN Failures to tolerate = 2 (RAID6) Thin provisioned = Yes Number of disk stripes per object = 1 Encryption enabled = Yes Deduplication and Compression enabled = No
vSAN encryption RAID1 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid1_vsan_policy Storage Type = vSAN Failures to tolerate = 1 (RAID 1) Thin provisioned = Yes Number of disk stripes per object = 1 Deduplication and Compression enabled = No
vSAN encryption RAID6 storage policy
Test Parameters
Configuration
vCenter Storage Policy
Name = raid6_vsan_policy Storage Type = vSAN Failures to tolerate = 2 (RAID6) Thin provisioned = Yes Number of disk stripes per object = 1 Deduplication and Compression enabled = No
Test Plans
The table below shows one individual test plan I have created. This plan is replicated for each of the tests listed below. Scroll across at the bottom to see the whole table.
RAID1 Baseline
RAID1 VM Encryption
RAID1 vSAN Encryption
RAID6 Baseline
RAID6 VM Encryption
RAID6 vSAN Encryption
The tests were run for 3 hours each including a warm up and warm down period.
Test
Number of disks
Working Set %
Number of threads
Block size (k)
Read %
Write %
Random %
Test time (s)
1
2 (O/S and Data)
100%
1
4k
0
100
100
7200
2
2 (O/S and Data)
100%
2
4k
0
100
100
7200
3
2 (O/S and Data)
100%
1
4k
20
80
100
7200
4
2 (O/S and Data)
100%
2
4k
20
80
100
7200
5
2 (O/S and Data)
100%
1
4k
70
30
100
7200
6
2 (O/S and Data)
100%
2
4k
70
30
100
7200
7
2 (O/S and Data)
100%
1
16k
0
100
100
7200
8
2 (O/S and Data)
100%
2
16k
0
100
100
7200
9
2 (O/S and Data)
100%
1
16k
20
80
100
7200
10
2 (O/S and Data)
100%
2
16k
20
80
100
7200
11
2 (O/S and Data)
100%
1
16k
70
30
100
7200
12
2 (O/S and Data)
100%
2
16k
70
30
100
7200
13
2 (O/S and Data)
100%
1
64k
0
100
100
7200
14
2 (O/S and Data)
100%
2
64k
0
100
100
7200
15
2 (O/S and Data)
100%
1
64k
20
80
100
7200
16
2 (O/S and Data)
100%
2
64k
20
80
100
7200
17
2 (O/S and Data)
100%
1
64k
70
30
100
7200
18
2 (O/S and Data)
100%
2
64k
70
30
100
7200
19
2 (O/S and Data)
100%
1
128k
0
100
100
7200
20
2 (O/S and Data)
100%
2
128k
0
100
100
7200
21
2 (O/S and Data)
100%
1
128k
20
80
100
7200
22
2 (O/S and Data)
100%
2
128k
20
80
100
7200
23
2 (O/S and Data)
100%
1
128k
70
30
100
7200
24
2 (O/S and Data)
100%
2
128k
70
30
100
7200
Results
Click on the graphs for a larger view
IOPS comparison for all RAID1 and RAID6 tests
IOPS measures the number of read and write operations per second. The pattern for the 3 different tests is consistent where the heavier write tests show the least IOPs gradually increasing in IOPs as the writes decrease. IOPS and block size tend to have an inverse relationship. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS.
It is clear to see from the graphs that RAID1 VM encryption and RAID1 vSAN encryption produces more IOPS for all tests than RAID6 VM encryption and RAID6 vSAN encryption. This is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity)
Each write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity.
RAID1 VM encryption outperforms RAID1 vSAN encryption in terms of IOPs. The RAID6 results are interesting where at the lower block sizes, RAID6 VM encryption outperforms RAID6 vSAN encryption however at the higher block sizes, RAID6 vSAN encryption outperforms VM encryption.
In order of the highest IOPs
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 VM encryption
RAID 6 vSAN encryption
Throughput comparison for all RAID1 and RAID6 tests
IOPs and throughput are closely related by the following equation.
Throughput (MB/s) = IOPS * Block size
IOPS measures the number of read and write operations per second, while throughput measures the number of bits read or written per second. The higher the throughput, the more data which can be transferred. The graphs follow a consistent pattern from the heavier to the lighter workload tests. I can see the larger block sizes such as 64K and 128K have the greater throughput in each of the workload tests than 4K or 8K. As the block sizes get larger in a workload, the number of IOPS will decrease. Even though it’s fewer IOPS, you’re getting more data throughput because the block sizes are bigger. The vSAN datastore is a native 4K system. It’s important to remember that storage systems may be optimized for different block sizes. It is often the operating system and applications which set the block sizes which then run on the underlying storage. It is important to test different block sizes on storage systems to see the effect these have.
RAID1 VM encryption has the best performance in terms of throughput against RAID1 vSAN encryption however the results are very close together.
RAID6 vSAN encryption has the best performance in terms of throughput against RAID6 VM encryption.
In order of highest throughput
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Read Latencycomparison for all RAID1 and RAID6 tests
The pattern is consistent between the read/write workloads. As the workload decreases, read latency decreases although the figures are generally quite close. Read latency for all tests varies between 0.40 and 1.70ms which is under a generally recommended limit of 15ms before latency starts to cause performance problems.
There are outlier values for the Read Latency across RAID1 VM Encryption and RAID1 vSAN encryption at 4K and 16K when testing 2 threads which may be something to note if applications will be used at these block sizes.
RAID1 vSAN encryption incurs a higher read latency in general than RAID1 VM encryption and RAID6 VM encryption incurs a higher read latency in general than RAID6 vSAN encryption however the figures are very close for all figures from the baseline.
RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency.
From the lowest read latency to the highest
RAID6 vSAN encryption
RAID6 VM encryption
RAID1 VM encryption
RAID1 vSAN encryption
Write latency comparison for all RAID1 and RAID6 tests
The lowest write latency is 0.8ms and the largest is 9.38ms. Up to 20ms is the recommended value from VMware however with all flash arrays, this should be significantly lower which is what I can see from the results. With NVMe and flash disks, the faster hardware may expose bottlenecks elsewhere in hardware stack and architecture which can be compared with internal VMware host layer monitoring. Write latency can occur at several virtualization layers and filters which each cause their own latency. The layers can be seen below.
Latency can be caused by limits on the storage controller, queuing at the VMkernel layer, the disk IOPS limit being reached and the types of workloads being run possibly alongside other types of workloads which cause more processing.
The set of tests at the 100% write/0% read and 80% write/20% read have nearly no change in the write latency but it does decrease more significantly for the 30% write/70% read test.
As expected, all the RAID6 results incurred more write latency than the RAID1 results. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency.
When split into the RAID1 VM encryption and RAID1 vSAN encryption results, RAID1 VM encryption incurs less write latency than RAID1 vSAN encryption however the values are very close.
When split into the RAID6 VM encryption and RAID6 vSAN encryption results, RAID6 VM encryption seems to perform with less write latency at the lower block sizes however performs with more write latency at the higher block sizes than RAID6 vSAN encryption.
From the lowest write latency to the highest.
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Latency Standard Deviation comparison for all RAID1 and RAID6 tests
The standard deviation value in the testing results uses a 95th percentile. This is explained below with examples.
An average latency of 2ms and a 95th percentile of 6ms means that 95% of the IO were serviced under 6ms, and that would be a good result
An average latency of 2ms and a 95th percentile latency of 200ms means 95% of the IO were serviced under 200ms (keeping in mind that some will be higher than 200ms). This means that latencies are unpredictable and some may take a long time to complete. An operation could take less than 2ms, but every once in a while, it could take well over 200
Assuming a good average latency, it is typical to see the 95th percentile latency no more than 3 times the average latency.
I analysed the results to see if the 95th percentile latency was no more than 3 times the average latency for all tests. I added new columns for multiplying the latency figures for all tests by 3 then comparing this to the standard deviation figure. The formula for these columns was =sum(<relevant_latency_column*3)
In the 80% write, 20% read test for the 64K RAID1 Baseline there was one result which was more than 3 times the average latency however not by a significant amount. In the 30% write, 70% read test for the 64K RAID6 Baseline, there were two results which were more than 3 times the average latency however not by a significant amount.
For all the RAID1 and RAID6 VM encryption and vSAN encryption tests, all standard deviation results overall were less than 3 times the average latency indicating that potentially, AES-NI may give encryption a performance enhancement which prevents significant latency deviations.
ESXi CPU usage comparison for all RAID1 and RAID6 tests
I used a percentage change formula on the ESXi CPU usage data for all tests. Percentage change differs from percent increase and percent decrease formulas because both directions of the change (Negative or positive) are seen. VMware calculated that using a percentage change formula, that VM encryption added up to 20% overhead to CPU usage (This was for an older vSphere O/S). There are no figures for vSAN encryption from VMware so I have used the same formula for all tests. I used the formula below to calculate the percentage change for all tests.
% change = 100 x (test value – baseline value)/baseline value
The lowest percentage change is -7.73% and the highest percentage change is 18.37% so the tests are all within VMware’s recommendation that encryption can add up to 20% more server CPU usage. Interestingly when the figures are negative, it shows an improvement over the baseline. This could be due to the way AES-NI boosts performance when encryption is enabled. RAID6 VM Encryption and vSAN encryption show more results which outperformed the baseline in these tests than RAID1 VM Encryption and vSAN encryption.
What is interesting about the RAID1 vSAN encryption and RAID6 vSAN encryption figures is that RAID1 vSAN encryption CPU usage goes up between 1 and 2 threads however RAID6 vSAN encryption CPU usage goes down between 1 and 2 threads.
Overall, there is a definite increase in CPU usage when VM encryption or vSAN encryption is enabled for both RAID1 and RAID6 however from looking at graphs, the impact is minimal even at the higher workloads.
RAID6 VM encryption uses less CPU at the higher block sizes than RAID6 vSAN encryption.
From the lowest ESXi CPU Usage to the highest.
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 VM encryption
RAID1 vSAN encryption
vSAN CPU usage comparison for all RAID1 and RAID6 tests
For the vSAN CPU usage tests. I used a percentage change formula on the data for the vSAN CPU usage comparison tests. Percentage change differs from percent increase and percent decrease formulas because I can see both directions of the change (Negative or positive) Negative values indicate the vSAN CPU usage with encryption performed better than the baseline. VMware calculated that using a percentage change formula, that VM encryption would add up to 20% overhead. There are no figures for vSAN encryption from VMware so I have used the same formula for these tests also.
% change = 100 x (test value – baseline value)/baseline value
The lowest percentage change is -21.88% and the highest percentage change is 12.50% so the tests are all within VMware’s recommendation that encryption in general can add up to 20% more CPU usage. Interestingly when the figures are negative, it shows an improvement over the baseline. This could be due to the way AES-NI boosts performance when encryption is enabled.
RAID1 VM encryption and RAID1 vSAN encryption uses more vSAN CPU than RAID6 VM encryption and RAID6 vSAN encryption. All of the RAID6 VM encryption figures performed better than the RAID6 baseline with the majority of RAID6 vSAN encryption figures performing better than the baseline. In comparison RAID1 VM encryption and RAID1 vSAN encryption nearly always used more CPU than the RAID1 baseline.
From the lowest vSAN CPU usage to the highest.
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 vSAN encryption
RAID1 VM encryption
Conclusion
The following pages provide a final conclusion on the comparison between the functionality and performance of VM Encryption and vSAN Encryption.
Functionality
The main functionality differences can be summed up as follows
The DEK key is stored encrypted in the VMX file/VM advanced settings.
vSAN and VM encryption use the exact same encryption and kmip libraries but they have very different profiles. VM Encryption is a per-VM encryption.
VM Encryption utilizes the vCenter server for key management server key transfer. The hosts do not contact the key management server. vCenter only is a licensed key management client reducing license costs.
Enabled on a virtual cluster datastore level. Encryption is happening at different places in the hypervisor’s layers.
Data travels unencrypted, but it is written encrypted to the cache layer.
Full compatibility with deduplication and compression.
More complicated to set up with a key management server as each vendor has a different way of managing the trust between the key management server the vCenter Server.
The DEK key is stored encrypted in metadata on each disk.
vSAN and VM encryption use the exact same libraries but they have very different profiles.
VM Encryption utilizes the vCenter server for key management server key transfer. The hosts do not contact the key management server. vCenter only is a licensed key management client reducing license costs.
vSAN only, no other storage is able to be used for vSAN encryption.
Functionality conclusion
VM encryption and vSAN encryption are similar in some functionality. Both use a KMS server, both support RAID1, RAID5 and RAID6 encryption and both use the same encryption libraries and the kmip protocol. However, there are some fundamental differences. VM encryption gives the flexibility of encrypting individual virtual machines on a datastore opposed to encrypting a complete datastore with vSAN encryption where all VMs will automatically be encrypted. Both solutions provide data at rest encryption but only VM encryption provides end to end encryption as it writes an encrypted data stream whereas vSAN encryption receives an unencrypted data stream and encrypts it during the write process. Due to this level at which data is encrypted at, VM encryption cannot be used with features such as deduplication and compression however vSAN encryption can. It depends if this functionality is required and if the space which could be saved was significant. VM encryption is datastore independent and can use vSAN, NAS, FC and iSCSi datastores. vSAN encryption can only be used on virtual machines on a vSAN datastore. Choosing the encryption depends on whether different types of storage reside in the environment and whether they require encryption.
The choice between VM encryption functionality and vSAN encryption functionality will be on a use case dependency of whether individual virtual machine encryption control is required and/or whether there is other storage in an organization targeted for encryption. If this is the case, VM encryption will be best. If these factors are not required and deduplication and compression are required, then vSAN encryption is recommended.
Performance conclusion
The performance tests were designed to get an overall view from a low workload test of 30% Write, 70% Read through a series of increasing workload tests of 80% Write, 20% Read and 100% Write, 0% Read simulation. These tests used different block sizes to simulate different application block sizes. Testing was carried out on an all flash RAID1 and RAID6 vSAN datastore to compare the performance for VM encryption and vSAN encryption. The environment was set up to vendor best practice across vSphere ESXi, vSAN, vCenter and the Dell server configuration.
It can be seen in all these tests that performance is affected by the below factors.
Block size.
Workload ratios.
RAID level.
Threads used
Application configuration settings.
Access pattern of the application.
The table below shows a breakdown of the performance but in some cases the results are very close
Metric
1st
2nd
3rd
4th
IOPs
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 VM encryption
RAID6 vSAN encryption
Throughput
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Read Latency
RAID6 vSAN encryption
RAID6 VM encryption
RAID1 VM encryption
RAID1 vSAN encryption
Write Latency
RAID1 VM encryption
RAID1 vSAN encryption
RAID6 vSAN encryption
RAID6 VM encryption
Standard Dev
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
All standard deviation results were less than 3 times the average latency which is recommended with minor outliers
ESXi CPU Usage
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 VM encryption
RAID1 vSAN encryption
vSAN CPU Usage
RAID6 VM encryption
RAID6 vSAN encryption
RAID1 vSAN encryption
RAID1 VM encryption
In terms of IOPs, RAID1 VM encryption produces the highest IOPS for all tests. This is expected due to the increased overhead RAID6 incurs over RAID1 in general. RAID 1 results in 2 writes, one to each mirror. A RAID6 single write operation results in 3 reads and 3 writes (due to double parity) causing more latency decreasing the IOPs.
In terms of throughput, RAID1 VM encryption produces the highest throughput for all tests. It is expected that by producing the highest IOPs in the majority of tests would mean it would produce a similar result for the throughput. Depending on whether your environment needs larger IOPs or larger throughput depends on the block sizing. Larger block sizes produce the best throughput due to getting more data through the system in bigger blocks. As the block size increases, it takes longer latency to read a single block, and therefore the number of IOPS decreases however, smaller block sizes yield higher IOPS.
In terms of read latency, RAID6 vSAN encryption performed best in the read latency tests. Read latency for all tests varies between 0.40 and 1.70ms which is under a generally recommended limit of 15ms before latency starts to cause performance problems. RAID6 has more disks than mirrored RAID1 disks to read from than RAID1 therefore the reads are very fast which is reflected in the results. Faster reads result in lower latency. The values overall were very close.
In terms of write latency, RAID1 VM encryption performed best. All the RAID6 results incurred more write latency than the RAID1 results which was to be expected. Each RAID6 write operation requires the disks to read the data, read the first parity, read the second parity, write the data, write the first parity and then finally write the second parity producing a heavy write penalty and therefore more latency. The lowest write latency is 0.8ms and the largest is 9.38ms. Up to 20ms is the recommended value therefore all tests were well within acceptable limits.
The performance of encrypted data also seems to be enhanced by the use of newer flash disks like SSDs and NVME showing latency figures which were within the acceptable values. SSD and NVMe uses a streamlined lightweight protocol compared to SAS, SCSI and AHC protocols while also reducing CPU cycles.
In terms of standard deviation, all standard deviation test results were less than 3 times the average latency which is recommended.
In terms of average ESXi CPU and vSAN CPU usage, RAID6 VM encryption produced the lowest increase in CPU. All encryption appeared to be enhanced by leveraging the AES-NI instructions in Intel and AMD CPU’s. The increase in CPU usage by the hosts and vSAN compared to the baseline for both sets of encryption tests is minimal and within acceptable margins by a considerable amount. In some cases, there was lower CPU use than the baseline possibly due to the AES-NI offload.
Encryption recommendation
Overall RAID1 VM encryption produces the best IOPs, throughput and write latency including the standard deviation metric values for latency being well under the acceptable limits. RAID1 ESXi CPU usage and vSAN CPU usage is higher than RAID6 however the difference is minimal when looking at the graphs especially in some cases where both sets of tests can outperform the baseline across the different block sizes. For applications which need very fast read performance, RAID6 will always be the best option due to having more disks than mirrored RAID1 disks to read from therefore this encryption should be matched to a specific application requirement if reads are a priority.
This is an interesting feature of vSAN which came up in work recently. vSAN supports thin provisioning which lets you use as much capacity as currently needed where you can add more space in the future. One challenge to thin provisioning is that the VMDKs will not shrink when files within the guest O/S are deleted. An even bigger problem develops where many file systems will always direct new writes into free space rather than the old used space. Previous solutions to this involved manual intervention and storage vMotion to external storage or powering off an external machine. vSAN Trim/Unmap space reclamation solves this problem.
How does it work?
Modern guest O/S file systems have had the ability to reclaim no longer used space which are known as Trim/Unmap commands for the ATA and SCSI protocols. vSAN 6.7U1+ now has full awareness of Trim/Unmap commands sent from the guest O/S and can reclaim previously allocated storage as free space.
Benefits
Faster repair means that blocks which have been reclaimed do not need to be rebalanced or remirrored in the event of a device failure
Removal of dirty cache pages means that read cache can be freed up in the DRAM client cache as well as the hybrid vSAN SSD cache for use by other blocks. If removed from the write buffer then this reduces the number of blocks copied to the capacity tier.
Performance Impact
It does carry some performance impact as I/O must be processed to track pages which are no longer needed. The largest impact will be the UNMAPs issued against the capacity tier. vSAN 7U1 includes performance enhancements which help provide the fairness of UNMAPs in heavy write environments.
A minimum of virtual machine hardware version 11 for Windows
A minimum of virtual machine hardware version 13 for Linux.
disk.scsiUnmapAllowed flag is not set to false. The default is an implied true. This setting can be used as a “kill switch” at the virtual machine level should you wish to disable this behaviour on a per VM basis and do not want to use in guest configuration to disable this behaviour. VMX changes require a reboot to take effect.
The guest operating system must be able to identify the virtual disk as thin.
After enabling at a cluster level, virtual machines must be power cycled
Monitoring TRIM/UNMAP
TRIM/UNMAP uses the following counters in the vSAN performance service for the hosts as seen in the figure below courtesy of VMware.
UNMAP Throughput – The measure of UNMAP commands being processed by the disk groups of a host.
Recovery UNMAP Throughput – The measure of throughput of UNMAP commands be synchronized as part of an object repair following a failure or absent object.
Using the advanced performance counters for a host will also show you the below counters
The Advanced Encryption Standard Instruction Set and the Intel Advanced Encryption Standard New Instructions allows specific Intel/AMD and other CPUs to do extremely fast hardware encryption and decryption. AES (Advanced Encryption Standard), is a symmetric block cipher which means that blocks of text which have a size of 128 bits are encrypted, which is the opposite to a stream cipher where each character is encrypted one at a time. The algorithm takes a block of plain text and applies alternating rounds of substitution and permutation boxes to it which are separate stages. In AES, the size of each box is 128, 192 or 256 bits, depending on the strength of the encryption with 10 rounds applied for a 128-bit key, 12 rounds for the 192-bit key, and 14 rounds for the 256-bit key, providing higher security.
The figure below shows that potential key combinations exponentially increase with the key size. AES-256 is impossible to break by a brute force attack based on current computing power, making it the strongest encryption standard. However longer key and more rounds requires higher performance requirements. AES 256 uses 40% more system resources than AES 192, and is therefore best suited to high sensitivity environments where security is more important than speed.
AES Block Cipher Modes
There are different AES block cipher modes that are part of AES.
Electronic Code Book
The simplest block cipher mode is Electronic Code Book. This cipher mode just repeats the AES encryption process for each 128-bit block of data. Each block is independently encrypted using AES with the same encryption key. For decryption, the process is reversed. With ECB, identical blocks of unencrypted data, referred to as plain text, are encrypted the same way and will produce identical blocks of encrypted data. This cipher mode is not ideal since it does not hide data patterns well.
Cipher Block Chaining
A newer block cipher mode was created called Cipher Block Chaining. CBC’s aim is to achieve an encryption method that encrypts each block using the same encryption key producing different cipher text, even when the plain text for two or more blocks is identical. Cipher Block Chaining addresses security weaknesses with ECB.
AES-XTS Block Cipher mode
AES-XTS Block Cipher Mode is a new block cipher mode and designed to be stronger than other modes. It eliminates potential vulnerabilities from sophisticated side channel attacks used to exploit weaknesses within other modes. XTS uses two AES keys. One key performs the AES block encryption; the other is used to encrypt what is known as a Tweak Value. This encrypted tweak is further modified with a Galois polynomial function (GF) and XOR with both the plain text and the cipher text of each block. The GF function ensures that blocks of identical data will not produce identical cipher text. This achieves the goal of each block producing unique cipher text given identical plain text without the use of initialization vectors and chaining. Decryption of the data is carried out by reversing this process.
What is AES-NI?
Intel AES New Instructions (Intel AES-NI) is a new encryption instruction set which contains improvements to the AES algorithm and accelerates the encryption of data in the Intel Xeon processor family and the Intel Core processor suite. AES is a symmetric block cipher that encrypts/decrypts data through several rounds. It is part of the FIPS standard.
There are seven new instructions. The instructions have been implemented to perform some of the complex and performance intensive steps of the AES algorithm. Hardware is used to accelerate the AES algorithms. Intel say that AES-NI can be used to accelerate the performance of an implementation of AES by 3 to 10x over a total software implementation.
How does it work?
A fixed block size of plain text is encrypted several times to produce a final encrypted output. The number of rounds (10, 12, or 14) used depends on the key length (128, 192, or 256). Each round feeds into the following round. Each round is encrypted using a subkey that is generated using a key schedule
What are the six new instructions?
The new instructions perform several computationally intensive parts of the AES algorithm using fewer clock cycles than a software solution.
Four of the new instructions accelerate the encryption/decryption of a round
Two new instructions are for round key generation.
Improved security
The new instructions also improve security by preventing side channel attacks on AES. Encryption and decryption are performed completely in hardware without the need for software lookup tables. By running in data-independent time and not using tables, they help in eliminating the major timing and cache-based attacks that target table-based software implementations of AES. In addition, AES is simple to implement, with reduced code size, which helps reducing the risk of introducing security flaws, such as difficult-to-detect side channel leaks.
Most of the cloud providers such as Amazon, Google, IBM, Microsoft offer instances equipped with this Intel extension and use it as security feature in their products. AES can be used in applications where confidentiality and integrity is of highest priority. If cryptographic strength is a major factor in the application, AES is the best suited algorithm.
As seen in the diagram below, every Kubernetes cluster will have one or more control plane nodes and one more worker nodes. The Control Plane manages the worker nodes and the Pods in the cluster. The worker nodes host the pods that are the components of the application workload. There are no Cloud Providers if this is running on bare metal.
Components running on the control plane node include
etcd is the persistent datastore for Kubenetes which stores the cluster state.
kube-api-server is the front end for the Kubernetes control plane which exposes the Kubernetes API and the only component which accesses the etcd.
kube-scheduler assigns workloads to the worker nodes and decides which nodes pods will be run on.
The kube-controller manager runs a collection of control processes to manage various resources. It monitors when nodes go down, maintains the correct number of pods, joins services and pods and creates default accounts and API access tokens for new namespaces.
The cloud controller manager runs controllers which provision underlying infrastructure needed by workloads. It has a control loop to manage storage volumes if a workload needs persistent storage.
Components run on a worker node
Kubelet – Primary node agent which is responsible for spinning up containerized workloads that are assigned to its node.
Kube Proxy – Used for implementing Kubenetes services such as network components which connect workloads in the cluster.
A container runtime such as Docker
etcd
etcd is the database for Kubernetes. It is a distributed key value store. etcd clusters can be 3 or 5 nodes and each node has a copy of the datastore providing fault tolerance.
To maintain consensus, it uses an algorithm called Raft. The nodes can connect to each other on port 2380. To establish consensus they must maintain quorum which requires more than half the nodes in the cluster to be available. If Quorum is lost, the cluster cannot reach consensus and cannot process changes.
3 nodes can tolerate the loss of 1 node.
5 nodes can tolerate the loss of 2 nodes.
The diagram below shows the etcd members in their own dedicated cluster. The etcd client which is the Kubenetes API server connects to any of the members on port 2380.
Alternatively the etcd members can be located with the control plane components on the same machine. The location will depend on cost, performance and capacity. It is not recommended to share the etcd cluster for the Kubernetes cluster with other applications and worth dedicating an etcd installation to the Kubernetes cluster.
Using the Kubernetes command-line tool to find information
If we run the command get-nodes, we can see the 3 control nodes
If we run the command get pods -n kube-system , then we can see the 3 etcd pods that are running on the control plane nodes in a co-located config.
Kubenetes APIServer
The API server is where all control plane operations are exposed to the API. We use a tool called kubectl which translates commands into http rest style API calls.
Custom resource definitions
Kubernetes is extensible via custom resource definitions. CRDs are used to create our own API types.
Once we create a CRD in Kubernetes we can use it like any other native Kubernetes object thus leveraging all the features of Kubernetes
Kubenetes API resources
All components communicate with the API server. The API server’s REST endpoint implements the open API specification. Objects created in the API are implementations of Kubernetes resources
Resources include Pods, Services and Namespaces. Each resource contains a spec which defines the desired state of the resource and a status which includes the current state of the object in the system.
Resources can come under a cluster or namespace scope depending on their implementation. You can see below how some resources fit into each category.
API versioning
There are several meanings in Kubernetes
Alpha level
Could contain bugs
May be disabled by default
Lack of support
Beta Level
Tested
Enabled by default
Supported for a length of time to enable adoption and use
Details may change
GA Level
Stable
Will be available through several versions
Details are set
Authentication and Authorization
kubectl is used by parsing a local configuration file containing authentication data and data about the request and posting that json data to the API endpoint. The API server then answers requests from the controllers.
Firstly, the API server needs to authenticate that you are allowed to make a request using different methods of configuration authentication. Kubernetes doesn’t have a concept of a user object and doesn’t store user details. It uses authenticators for this task which are configured by the administrators.
For authorization, the API leverages authorizers and authorization policies. To view these, you can type kubectl auth can-i –list to list the resources, the resource URLs and the verbs you can use within a cluster.
Admission control comes after authorization. we can then validate or mutate the request. Validation just checks the validation logic and makes sure it is correct. Mutate will looks at an object and potentially change that object.
The API server then does some spec validations. These are validation routines and check that everything in your spec is correct and notify you on typos and format errors.
Scheduler
The scheduler’s job is to assign pods to nodes. When you create a pod request, you will provide the pods name and image that it will use. You don’t have to define a node for the pod but the option is there. The scheduler watches for new pod resources to be created. This watch functionality is exposed by the api servers to the controllers and the scheduler uses it to watch pods. When the scheduler finds one that doesn’t have a node name field set, it determines where the pod should run and updates the pod resource in the node name field to add a value to the field. The kubelet on the assigned node will then change the current state to the defined desired state.
The scheduler goes through a process of filtering and scoring stages. Filtering comes first and filters out any nodes which cannot host a pod. There may be something called a taint on the node which describes something a node cannot tolerate and if this isn’t written in the manifest then the node cannot host the pod. However, if the information is written in the manifest, then it will be able to.
A pod may request a certain amount of RAM and CPU or have a requirement for a GPU for example.
Once the filtering is complete, it moves on to the scoring stage. Scoring the candidates means finding the best host for scheduling the pod. Some pods will have an affinity section which sets a preference for scheduling a pod in a certain zone. Another scoring factor could be whether the node has the container image being used by the pod. Also lower workload utilisation on a node may give a preference.
Customised scheduling – Policies and Profiles
You can configure the behaviour of the default scheduler using policies and profiles using predicates (used for filtering) and priorities (used for scoring)
You can also build your own scheduler with custom scheduling logic instead of the existing scheduler.
Running more than one scheduler
The scheduler should be run in a highly available configuration at all times however, only one scheduler is active at any one time.
The first scheduler will acquire a leader lease using an endpoint by default to record a leader lease. The other schedulers will be online and fail to acquire the leader lease. They will periodically it check the lease is current for the active leader and will succeed if the leader is unavailable
The Kube Controller Manager
The Kube Controller Manager runs the core control loops for the Kubernetes control plane. There are many different controllers. Several are responsible for maintaining the desired state of common resources in a Kubernetes cluster. Each controller has a specific set of functionality which depend on the resource they manage.
A control loop is a non terminating loop which regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the API server and makes changes to move the current state towards the desired state.
A controller is responsible for managing a resource and it will have a watch on the resource kind for which it is responsible. The watch is a continuous connection with the kube-api server where notification of changes is sent to the controller. It will then work on changing the existing state to the desired state. It will try to continue trying if it can’t finish the first time.
If a replica set is created then we are duplicating pods. If we want 4 replicas as our desired state. the scheduler will start assigning these replica sets to nodes. A deployment controller will create these replica sets.
Like the scheduler, when used in a highly available configuration, the controller manager uses a leader election to ensure only one instance is actively managing resources in a cluster at a time.
Cloud Controller Manager
The cloud controller manager is similar to the kube controller manager. It is a collection of controllers with the same principles around control loops and leader elections. The Cloud Controller Manager will not be found in every cluster if you’re running on bare metal. The cloud controller manager lets you link your cluster into your cloud provider’s API and separates out the components that interact with that cloud platform from components that just interact with your cluster.
Controllers inside the Cloud controller manager
Node Controller: The node controller is responsible for creating node objects when new servers are created in your cloud infrastructure. The node controller obtains information about the hosts running inside your tenancy with the cloud provider.
Route controller: The route controller is responsible for configuring routes in the cloud correctly so that containers on different nodes in your Kubernetes cluster can communicate.
Service Controller: Services integrate with cloud infrastructure components such as managed load balancers, IP addresses, network packet filtering, and target health checking. The service controller interacts with your cloud provider’s APIs to set up load balancers and other infrastructure components.
Examples
If you have a workload that you would like to expose to requests from outside, one way to do this is to put it behind a Kubernetes service such as a load balancer. If you have a cloud controller manager set up, it will configure a load balancer through your cloud providers API and configure it to route traffic to the pods for your workload.
Another example could be a workload which requires persistent storage. You can set up storage classes which leverage a provisioner from a cloud provider. This allows you to provision backing storage volumes for your workload on demand for example by referencing a storage class from a Kubernetes stateful set. The cloud controller manager will use the cloud providers API to provision the storage volumes when needed so they can be mounted in a workloads pod.
Kubelet
The kubelet is the primary Kubernetes node agent. It runs on every node in the cluster. It’s responsible for running the containers for the pods which are scheduled to its node. The kubelet for each node keeps a watch on pod resources in the api server.
The kubelet is another Kubernetes controller which provides an interface between the Kubernetes control plane and the container runtime on each server in the cluster.
Whenever the scheduler assigns a pod to a node in the api server, the kubelet for that node reads the pod spec then instructs the container runtime to spin up the container to build that spec. The container runtime then downloads the container images if they’re not there, then starts the container. The kubelet instructs the container runtime using the container runtime interface or CRI.
The kubelet is the only Kubernetes component that does not run in a container. The kubelet along with the container runtime are installed and run directly on the machine that is the node in the cluster.
The other components are typically run in containers as kubernets pods but this is the general convention below
The Kubelet gets notifications from the api server on what pods to run. The api server and other control plane components get created by using static pod manifests. When you start the kubelet you can set a path to the directory or file that contains these static pod manifests. The kublelet tells the container runtime to spin up the containers for the pod manifests and monitor them for changes. It can also make http requests to remote endpoints or listen for http connections to get to pod manifests but the most common method is to do static pod manifests on a local file system seen below.
Kube Proxy
Similar to the kubelet the kube proxy runs on every node in the cluster. Unlike the kubelet the kube proxy runs in a Kubernetes pod
kube-proxy enables essential functionality for Kubernetes services. If services didn’t exist then when a client application needed to connect to server pods in a cluster then it would need to use the pods IP to connect to the pod on the IP network. It would need to retrieve and maintain a list of all the pod addresses which is unnecessary work. In Kubernetes it is likely that pods will be created and destroyed frequently so we need a better way to manage it. The service provides a stable IP address seen here as 10.10.10.1.
A controller keeps track of the pods associated with the service and adds and removes them from the back end pool as needed. The client just needs the address of the service and the rest is taken care of by the end point resource which is usually created on your behalf by a controller. If you create a service with a sector which references a label applied to the pods, the endpoints resource is created for you. There the addresses for the back end pods are maintained. If the pool of pods changes, the endpoints will be updated and the client requests routed appropriately. It looks like this service is a proxy, that load balances requests to the back end but in Kubernetes applications, it works slightly differently
We have a end point controller in the kube controller manager that manages the end point resources. It manages the associations between services and pods. Each node in the cluster has the kube proxy running and watching service and end point service resources. When a change occurs which needs updating, kube proxy updates rules in ip tables which is a network packet filtering utility which allows network rules to be set in the network stack of the Linux kernel. kube proxy offers alternatives to using ip tables but this is the most common option.
Now when a client pod sends a request to the services IP, it gets routed by the kernel to one of the pod IPs depending on the rule which has been set by kube proxy. When using IP tables the pod selected by the pool will be random. For more control you would have to use IPVS (IP Virtual Server) which implements layer 4 load balancing in the Linux kernel.
The services IP is a virtual IP and you won’t get a response if you ping it. It is essentially a key in the rule set by IP tables which gives network packet routing instructions to the host’s kernel. The client pod can use the service IP like it normally would as like it was calling an actual pod IP.
Sample manifests
The below screenprints shows a sample manifest for a deployment and service resources
The deployment manifest shows the name of test-deployment with 3 replica pods in the deployment. The selector indicates that this deployment will manage pods with the label app: test and in the template we give that label to the pods. Each pod will consist of a single container with the same sample-container and run the nginx container image and listen on port 8080. It will also spin up a load balancer with the cloud provider to expose it to requests from outside the cluster.
kubectl appy -f deployment.yaml -f service.yaml
This command above will translate the command into a REST API call to the Kubernetes API server. The API server will authenticate and authorize the user and then apply any admission control operations such as pod security policies. if it fails the resource will not be created
Once this is complete the various controllers in the system are notified by the watch mechanism and work to change the existing state to the desired state.
The Deployment controller creates the corresponding replica set with the 3 replicas which were defined in the manifest
The Replication controller is notified of the new replica sets and in response creates the 3 separate pod resources using the pod template
The Endpoints controller will create the endpoints resource which connects the individual pods to the service by the pods label
Another controller which is notified in response to the resources being created is the service controller. The previous controllers are part of the core Kubernetes controllers in the kube controller manager. This controller is part of the cloud controller manager which is responsible for integrations with the underlying cloud providing infrastructure. The service controller is notified by its watch when the services resource is created. It notices that the spec includes a load balancer and responds by calling the cloud providers api to have a load balancer provisioned to route traffic to the cluster and associated pods.
Next the scheduler is watching for new pods and when the replication controller creates them, the scheduler is notified and responds by finding worker nodes for the containers to run on to fulfil the pod spec. Once the assignments have been made, the next controller which is the kubelet instructs the local container runtime to create the requested container image from the nginx deploy spec manifest.
Now the containers are up and running we have to get network access to these containers and this is what kube proxy is used for. It watches the endpoints resource which connects the service to the pod and updates IP tables on it nodes to ensure that traffic sent to the services ip gets routed to one of the pod IPs. This includes any client request from outside the cluster, inside the cluster and through the cloud providers load balancer.
VMware Cloud Foundation provides a software-defined stack including VMware vSphere with Kubernetes, VMware vSAN, VMware NSX-T Data Center, and VMware vRealize Suite, VMware Cloud Foundation provides a complete set of software-defined services for compute, storage, network security, Kubernetes management, and cloud management.
What is vCloud Foundation Lab Constructor?
VLC is an automated tool built by Ben Sier and Heath Johnson that deploys an entire nested Cloud Foundation environment onto a single physical host or vSphere Cluster. It is an unsupported tool, this will allow you to learn about VCF with a greatly reduced set of resource requirements. VLC deploys the Core SDDC components in the smallest possible form factor. Specifically, components like the vCenter and vRealize Log Insight nodes are deployed in the tiny, and xsmall format as specified in a JSON config file. With these two stages, the reduction of physical resources needed to deploy the VLC nested lab components becomes possible on a single physical host with 12 CPU Cores, 128 GB RAM, and 2 TB of SSD Disk.
An overall view of what VLC looks like
DownloadVLC
You will need to register at http://tiny.cc/getVLC and then you will be provided with a zip file.
vExpert Program – If you are a vExpert you can log in and download the software for free however there is not an NSX-T license available here, only in VMUG I believe.
VCF customers – Can download what you need from the My VMware Portal
Pre-requisites
Step 1
You need a single physical host running ESXi 6.7+ with 12 cores, 128 GB RAM and 800 GB SSD. This is the minimum requirement for using VLC and you will need to configure the host in 1 of 4 configurations below.
Standalone ESXi (No vCenter) using a vSS
ESXi host with vCenter using vSS
Single ESXi host in a cluster using vDS
Multiple ESXi hosts in a cluster using vDS
If you are running multiple hosts in a vSAN cluster then run the following command on all hosts because you will be in effect nesting a vSAN within a vSAN
esxcli system settings advanced set -o /VSAN/FakeSCSIReservations -i 1
If you are deploying to a single physical host don’t worry about physical VLANs as all the traffic will reside on that single physical host. If you are deploying to a vSphere cluster you’ll need at least 1 VLAN (10 is the default) physically configured and plumbed up on your physical switch to all hosts in that cluster. If you intend to do anything with NSX (AVN’s are the common thread) you’ll also need 3 additional VLANs (11-13 are default)
If in a cluster configuration, disable all HA and DRS and vMotion on the physical host(s).
You will need a virtual switch (VSS or vDS) with the MTU set to 9000
On the vSwitch, create a portgroup for VCF with VLAN Trunking (0-4094) enabled. On the portgroup (not the switch) set the following security settings:
I chose to deploy my lab on one host with a vDS switch.
Step 2
Build a Windows-based jump host on this ESXi host as a VM and install the following software.
Windows 10/2012/2016 (Older versions are not supported)
Powershell 5.1+
PowerCLI 11.3+
OVFTool 4.3+ (64bit)
Net Framework
VMXNET3 NICs – 1500 MTU
On this jump host, attach two virtual NICs.
Attached one NIC to your local LAN Network so you can RDP to it.
Attach the second NIC to the VCF PortGroup created in Step 1 and configure it with the IP 10.0.0.220. Set the DNS on the second NIC to 10.0.0.221. The 10.0.0.221 address will be the address assigned to the Cloud Builder appliance, by default. VLC will modify the Cloud Builder appliance so that it provides specific services, like DNS, for the nested environment. Thus, this using this IP for DNS will allow you to access the nested VCF environment when using the default configuration file in Automated mode.
This second NIC will also need to be configured in the NIC properties to use the VLAN of your management network. In the default Automated VLC configuration this is VLAN 10.
The jump host should look like the below
On the jump host, do the following
Disable Windows Firewall.
Turn off Windows Defender Real-time Scanning. Note: this has a habit of resetting after reboots of the Windows VM.
Step 3
On the Windows jump host, create a local disk folder for VLC. This must be a local attached disk (i.e. “C:\VLC\ ”) as mapped Network drives will fail.
Download the VCF Software (Cloud Builder OVA) into this folder.
You used to have to download the vSphere ESXi ISO that matches the version required for VCF. The easiest method to do this was to simply copy the .iso file located on the Cloud Builder appliance but to make this even easier, VLC now provides an option in the setup GUI where it will download this file directly from the Cloud Builder appliance that it deploys.
Download and extract the VLC package to this folder as well
Install anything extra you need like Putty, WinSCP and Notepad++
Step 4
We now need to edit one of two files. You have a choice of Automated_AVN or Automated_No_AVN when deploying VLC.
Multiple sample bringup JSON formatted files are provided with VLC. The selection of the bringup JSON file will dictate if AVN will be implemented at bringup or not. Regardless of which bringup file is to be used, you will need to edit the file to be used in order to define the license keys to be used. The default configuration files do not include any license keys. Using a text editor, edit the appropriate file, as desired with an ESXi license, vCenter license, NSX-T license and vSAN license.
Step 5
Either open a Powershell window (as Administrator) and execute the VLC PowerShell Script “C:\VLC\VLCGUi.ps1” or right click on the VLCGUI.ps1 and select ‘Run with PowerShell’.
VLC UI will Launch
Once the above screen completes, you will see the below screen. Select the “Automated” Button. This will build your first four hosts for the Management Domain. This is done by creating four virtual nested ESXi hosts. These nested hosts are automatically sized and created for you. You are able to configure the hostnames and IP addresses to be used within the configuration file that you provide the VCF Lab Constructor
Click on the field titled ’VCF EMS JSON’ and select the JSON file that you just entered the license keys for.
Click on the CB OVA Location field to select the location of the CB OVA.
(Optional) Enter the External GW for the Cloud Builder Appliance to use. This allows you to point to a gateway that will allow internet access.
Click the Connect Button
VLC will connect to the host or vCenter you specified and will validate all necessary settings. It will then populate the Cluster, Network, and Datastore fields with information gathered from your environment.
Select the cluster, network (port group) and datastore that you desire VLC to install the nested lab to. The Cluster field will not display any value if you are deploying directly to a single ESXi host.
** If your port group does not show up, you need to check to see if the previous security settings have been set explicitly on the port group and not just the switch.
Click the yellow Validate button
As VLC validates the information, it will mark the fields in green. When everything has been validated, the Validate button will change to a green button that says ‘Construct’.
Note the Bring up box. his is a fully documented process in the installation of VCF. Using the VCF Lab Constructor will allow you to manually do this so you can follow the steps of the official VMware Documentation, or if you check the box in the GUI the VCF Lab Constructor will complete Bring-up for you automatically.
Click Construct to begin the deployment of VMware Cloud Foundation.
The process will take some time to complete. On average, expect to wait three and a half hours for the deployment process to complete.
Logging
During bringup logs can be found in the Cloudbuilder appliance in the /var/log/vmware/vcf/bringup directory – Check vcf-bringup-debug.log in that directory.
For problems deploying VC and PSC on bringup look in /var/log/vmware/vcf/bringup/ci-installer-xxxx/workflow_xxxx/vcsa-cli-installer.log
After bringup you can look at the SDDC Manager for logs. The are all rooted in the /var/log/vmware/vcf folder. Depending on what operation you are performing you can look into one of the other folders.
Domain Manager – Used when creating/deleting/expanding/shrinking new workload domains:/var/log/vmware/vcf/domainmanager/domainmanager.log
Operations Manager – Used when commissioning/decommissioning hosts and resource utilization collection:/var/log/vmware/vcf/operations/operationsmanager.log
LCM – Used for Life cycle management activities like downloading bundles, applying updates: /var/log/vmware/vcf/lcm/lcm.log
Accessing the VCF UI
To gain network access when the VCF components are installed on layer 3, your jump host will need a NIC with multiple IP addresses or you will need multiple NICs. Be aware that because everything is nested inside Layer 2 all network traffic is being broadcast back up to Layer 1 port groups. Simply having your jump host on this subnet or port group and listening on the default VCF subnet i.e. (192.168.0.0) will allow you to access everything in layer 3. The jump host can also be nested at layer 1 or a physical desktop that has access to the same subnet. Nesting it at Layer 1 has the best performance.
The below diagram courtesy of VMware shows the networks which are created
Further tasks– Expanding the number of hosts
Using the Expansion pack option will now allow you to scale out hosts
When clicking on the Expansion pack option, you get the below screen
When you have used the Automated method to deploy your environment. VLC has configured the Cloud Builder appliance to provide essential infrastructure services for the managment domain. Before adding additional hosts, you will need to add the appropriate DNS entries to the Cloud Builder configuration. You can use the information below or further down the post, I go through using the expansion pack option when running the VLCGui.ps1 script again and modifying some VLC files.
Adding DNS entries for extra hosts
Use SSH to connect to your Cloud Builder VM and log in using the username (admin) and the password that you specified in the VLC GUI when you deployed the environment.
You will need to edit the DNS “db” file for the zone specified. As an example, assume that the domain ‘vcf.sddc.lab’ was used during the creation of the nested environment. This would mean the zone file would be located here: /etc/maradns/db.vcf.sddc.lab
After making your changes and saving the file you will need to reload maradns and the maradns.deadwood services. MaraDNS takes care of forward lookups and Deadwood takes care of Reverse DNS.
You would follow this same procedure for adding DNS entries for vRSLCM, vROps, vRA, Horzion, or any other component. Note: Certain software (like vROps, vRA, and Horizon) are not automated in VCF 4.0 via SDDC Manager. You may need to follow the manual guidance presented in the VCF documentation to deploy these software packages.
Logging into the SDDC
From the jump host you can log into the following
Hosts = 10.0.0.100-103
vCenter IP = 10.0.0.12 (https://vcenter-mgmt.sddc.lab)
Have a click around and get familiar with the user interface
What if we want to create a workload domain?
The initial part of this script deploys the 4 node ESXi management domain so what if we want to create some more hosts for a workload domain for the products below?
K8s
Horizon
HCX
vRealize suite
Step 1
First of all we are going to use the below 3 files and add DNS entries
Open the additional_DNS_Entries.txt file and add in the new 3 hosts. In my case it looks like this.
The next file to look at is the add_3_BIG_hosts_bulk_commission VSAN.json
Next we will have a look at the add_3_BIG_hosts_bulk_commission VSAN.jso file. This is used by the vCloud Foundation software itself.
So now we need to run the VLCGui.ps1 script again located in c:\VLC to get to the point where we see the expansion pack option below.
Run Powershell and run .\VLCGui.ps1
Click on Expansion pack
Add in 10 for the main VLAN
In the Addtl Hosts JSON file box, select your add_3_BIG_hosts.json
In the ESXi ISO Location, navigate to c:\VLC\cb_ESX_iso and select the ESXi image
Next add in the host password which is VMware123
Add in the NTP IP which points to the CloudBuilder appliance on 10.0.0.221
Add in the DNS IP which points to the CloudBuilder appliance on 10.0.0.221
Add in the domain name for the lab which is vcf.sddc.lab
Next put in your vCenter IP, username and password and click Connect
When connected, choose your cluster, network and datastore like you did when configuring this for the inital management host deployment and then click Validate and everything should be green
Click Construct
You will now see in your vCenter the extra hosts being created
Once finished, you should see the below message in PowerShell. You can see it took a total of around 8 minutes.
The hosts are now ready to be commissioned into SDDC Manager so we go back to sddc-manager.vcf.sddc.lab and click on Commission hosts in the top right hand corner.
Say Yes to the entire checklist and click Proceed
Next we will use the Import button to add the additional hosts.
Choose the add_3_BIG_hosts_bulk_commission VSAN.json file
Click upload
If you have a problem where you get the below message then follow the steps below
Log into the Cloudbuilder appliance using root and VMware123! and run the below command
Press i to insert new data and add in your new hosts in the same format as the other entries
Go back to SDDC manager and try an upload again and everything should be fine.
Select all hosts, click on the tickbox on the column saying Confirm FingerPrint and click Validate
Click Next
Review
You will see a message in SDDC Manager saying the hosts are being commissioned
Once commissioned, you will see them as unassigned hosts
Following on from this, I will be following the VLC manual to enable vSphere with Kubernetes on VLC
Enabling Kubernetes on VLC
In vcenter-mgmt.vcf.sddc.lab, set DRS to Conservative on mgmt-cluster
In vcenter-mgmt.vcf.sddc.lab, set VM Monitoring to Disabled
In vcenter-mgmt.vcf.sddc.lab, remove the CPU and memory reservation on nsx-mgmt-1
Make sure you have enough licensing available and add additional licenses if required.
We can now create a VI workload domain with the 3 extra hosts we added before. In sddc-manager.vcf.sddc.lab, click on the menu and select Inventory > Workload domains and click the blue + Workload Domain button. Then select the dropdown VI – Virtual Infrastructure
Select vSAN on Storage Selection and click begin
Enter a Virtual Infrastructure Name and Organisation name
Enter a cluster name and click Next
Fill in the details of the workload domain vCenter. I have screenshotted the file additional_DNS_Entries.txt from the c:\VLC folder next to this for reference. I used the password used throughout this lab which is VMware123! to keep everything easy.
Next we need to fill in the NSX information. Again the information has come from the additional_DNS_Entries.txt from the c:\VLC folder and the password needs to be stronger so I have used VMware123!VMware123!
Leave the vSAN storage parameters as they are
On the Host selection page, select your 3 new unassigned hosts
Put in your license keys
Check the object names
Review the final configuration
You will start to see the vcenter-wld appliance being deployed
You will see the workload domain activating
When it is finally done, we should see the following
If we log out of the vCenter and back in then we will see the linked mgmt and workload vCenters under one page
Edit the cluster settings and change the migration threshold to Conservative
In the HA settings, set the VM monitoring to disabled.
Edit the settings on the nsx1-wld to change the CPU and memory reservation to 0
In the sddc-manager.vcf.sddc.lab > Workload Domains > WLD-1 – Actions – Add Edge Cluster
Select All on the Edge Cluster Prerequisites page
Put in the Edge Cluster details (I followed this from the lab guide
Select Workload domain on the specify use case
Next, add the first Edge node, once you have filled everything in, select the button to add the second edge node
Add the second Edge node and click Add Edge node
Once complete, you should see that both Edge nodes are added successfully
On the Summary page, double check all your details are correct and click Next
Validation will run
Validation should succeed
Click Finish and check out SDDC Manager where you should see a task saying Adding edge cluster
You will see the Edge servers deploying in vCenter if you check
When complete it should say succcesful
In the vCenter, edit the settings for both edge1-wld and edge2-wld to change the CPU shares to normal and the memory reservation to 0
Go to the SDDC Dashboard and select Solutions and Deploy Kubernetes Workload Kubernetes – Workload Management.
Read the Pre-requisites and select all.
Select Workload domain, then cluster01 and next
It will then go through a process of validation
Read the Review page and click Complete in vSphere
In vcenter-wld.vcf.sddc.lab > Workload Management > Select cluster01 and click Next
Select Tiny and click Next
Enter network info and click next
Select storage policy for each component
Review and Confirm
In vcenter-wld.vcf.sddc.lab, you can monitor the tasks
The below actions then take place
The deployment of 3 x Supervisor Control Plane VMs
The creation of a set of SNAT rules (Egress) in NSX-T for a whole array of K8s services
The creation of a Load Balancer (Ingress) in NSX-T for the K8s control plane
The installation of the Spherelet on the ESXi hosts so that they behave as Kubernetes worker nodes
In vCenter, we can see that Workload Management has been succesfully created
Add 2 routes to the jump host
We now need to create a content library
Select subscribed content library and click Next
Accept the certificate
Select vSAN datastore and click Next
Click Finish
Go to vCenter > Home > Workload management > Create Namespace
I created a namespace called test-namespace
Download the kubectl plugin from the CLI Tools link. If you click on Open
You will get this page. Click Download CLI plugin and unzip it. I unzipped mine to c:\VLC\vsphere-plugin
Create permissions on the name space
Set Storage Policies
Open Command prompt, Navigate to c:\VLC\vsphere-plugin\bin
Login as administrator@vsphere.local to 10.50.0.1 which is just the first IP address of the ingress CIDR block you provided which is assigned to the load balancer in NSX that then points to the supervisor cluster
This is where I need to read up more on Kubernetes and vCloud in general to do anything else! 😀
VMware Tanzu is a portfolio of services for modernizing Kubernetes controlled container-based applications and infrastructure.
Application services: Modern application platforms
Build service: Container creation and management. Heptio, Bitnami and Pivotal come under this category. Bitnami packages and delivers 180+ Kubernetes applications, ready-to-run virtual machines and cloud images. Pivotal controls one of the most popular application frameworks, “Spring”, and offers customers the Pivotal Application Service recently announcing that PAS and its components, Pivotal Build Service and Pivotal Function Service are being evolved to run on Kubernetes.
Application catalogue: Production ready, open source containers
Data services: Cloud native data and messaging including Gemfire, RabbitMQ and SQL
Kubernetes grid. Enterprise ready runtime
Mission Control: Centralised cluster management
Observability: Modern app monitoring and analytics
Service mesh: App wide networking and control
VMware Tanzu services
What isTanzu Mission Control?
VMware Tanzu Mission Control is a SaaS based control plane which allows customers to manage all the Kubernetes clusters, across vSphere, VMware PKS, public clouds, managed services, packaged distributions from a central single point of control and single pane of glass. This will allow applying policies for access, quotas, back-up, security and more to individual clusters or groups of clusters. It will support a wide array of operations such as life-cycle management including initial deployment, upgrade, scale and delete. This will be achieved via the open source Cluster API project.
As these environments evolve, there can be a proliferation of containers and applications so how do you keep this all under control allowing the developers to do their jobs and operations to keep the infrastructure under control to help with the following
Map enterprise identity to Kubernetes RBAC across clusters
Define policies once and push them across clusters
Manage cluster lifecycle consistently
Unified view of cluster metrics, logs and data
Cross cluster cloud data
Automated policy controlled cross cluster traffic
Monitor Kubernetes costs
What is Project Pacific?
Project Pacific is an initiative to embed Kubernetes into the control plane of vSphere for managing Kubernetes workloads on ESXi hosts. The integration of Kubernetes and vSphere will happen at the API and UI layers, but also the core virtualization layer where ESXi will run Kubernetes natively. A developer will see and utilise Project Pacific as a Kubernetes cluster and an IT admin will still see the normal vSphere infrastructure.T
The control plane will allow the deployment of
Virtual Machines and cluster of VMs
Kubernetes Clusters
Pods
The Supervisor cluster
The control plane is made up of a supervisor cluster using ESXi as the worker nodes instead of Linux. This is carried out by by integrating a Spherelet directly into ESXi. The Spherelet doesn’t run in a VM, it runs directly on ESXi. This allows workloads or pods to be deployed and run natively in the hypervisor, alongside normal Virtual Machine workloads. A Supervisor Cluster can be thought of as a group of ESXi hosts running virtual machine workloads, while at the same time acting as Kubernetes worker nodes and running container workloads.
vSphere Native Pods
The supervisor cluster allows workloads or pods to be deployed. Native pods are actually containers that comply with the Kubernetes Pod specification. This functionality is provided by a new container runtime built into ESXi called CRX. CRX optimises the Linux kernel and hypervisor and removes some of the traditional heavy config of a virtual machine enabling the binary image and executable code to be quickly loaded and booted. The Spherelet ensures containers are running in pods. Pods are created on a network internal to the Kubernetes nodes. By default, pods cannot talk to each other across the cluster of nodes unless a Service is created. A Service in Kubernetes allows a group of pods to be exposed by a common IP address, helping define network routing and load balancing policies without having to understand the IP addressing of individual pods
CRX – Container runtime for ESXi
Each virtual machine has a vmm (virtual machine manager) and vmx (virtual machine executive) process that handles all of the other subprocesses to support running a VM. To implement Kubernetes, VMware introduced a new process called CRX (the container runtime executive) which manages the processes associated with a Kubernetes Pod. Each ESXi server also runs the equivalent of hostd (the ESXi scheduler) called spherelet, analogous to the kubelet in standard Kubernetes.
A CRX instance is a specific form of VM which is packaged with ESXi and provides a Linux Application Binary Interface (ABI) through a very isolated environment. VMware supply the Linux Kernel image used by CRX instances. When a CRX instance is brought up, ESXi will push the Linux image directly into the CRX instance. Since it is pretty much concentrated down from a normal VM, most of the other features have been removed and you can launch it in less than a second.
CRX instances have a CRX init process which provides the endpoint with communication with ESXi and allows the environment running inside of the CRX instance to be managed
Namespaces
A Namespace in the Kubernetes cluster includes a collection of different objects like CRX VMs or VMX VMs. Namespaces are commonly used to provide multi-tenancy across applications or users, and to manage resource quotas
Guest Kubernetes Clusters
It is important to understand that the Supervisor Cluster itself does not deliver regular Kubernetes based clusters. The supervisor Kubernetes cluster is a specific implementation of Kubernetes for vSphere which is not fully conformant with upstream Kubernetes. If you want general purpose Kubernetes workloads, you have to use Guest Clusters. Guest Clusters in vSphere use the open source Cluster API project to lifecycle manage Kubernetes clusters, which in turn uses the VM operator to manage the VMs that make up a guest.
What is Cluster API?
This is an Open source project for managing the lifecycle of a Kubernetes cluster using Kubernetes itself. You start with the management cluster which gives you an API with custom resources or operators.
TPM (Trusted Platform Module) is an industry standard for secure cryptoprocessors. TPM chips are serial devices found in most of today’s desktops, laptops and servers. vSphere 6.7 supports TPM version 2.0. Physical TPM chips are secure cryptoprocessors that enhance host security by providing a trust assurance in hardware compared to software. A TPM 2.0 chip validates an ESXi host’s identity. Host validation is the process of authenticating and attesting to the state of the host’s software at a given point in time. UEFI secure boot, which ensures that only signed software is loaded at boot time, is a requirement for successful attestation. The TPM 2.0 chip records and securely stores measurements of the software modules booted in the system, which vCenter Server verifies.
What is the functionality of TPM?
Random number generator: prevents the platform from relying on software pseudo random numbers generators to generate cryptographic keys (except for the primary keys generated from seeds in 2.0.
Symmetric and asymmetric cryptographic keys generator
Encryption/decryption.
It also provides secure storage capabilities in two memory types, Volatile and NonVolatile memory (NVRAM) for the following elements:
Primary Storage Key (known as Storage Root Key in TPM 1.2). This is a root key of a key hierarchy for key derivation process and stored in persistent memory
Other entities, such as Indexes, Objects, Platform Configuration Registers (PCR), Keys, Seeds and counters.
What is vTPM?
The Virtual Trusted Platform Module (vTPM) feature lets you add a TPM 2.0 virtual cryptoprocessor to a virtual machine. A vTPM is a software-based representation of a physical Trusted Platform Module 2.0 chip.
Differences Between a Hardware TPM and a Virtual TPM
You use a hardware Trusted Platform Module (TPM) as a cryptographic coprocessor to provide secure storage of credentials or keys. A vTPM performs the same functions as a TPM, but it performs cryptographic coprocessor capabilities in software. A vTPM uses the .nvram file, which is encrypted using virtual machine encryption, as its secure storage
A hardware TPM includes a preloaded key called the Endorsement Key (EK). The EK has a private and public key. The EK provides the TPM with a unique identity. For a vTPM, this key is provided either by the VMware Certificate Authority (VMCA) or by a third-party Certificate Authority (CA). Once the vTPM uses a key, it is typically not changed because doing so invalidates sensitive information stored in the vTPM. The vTPM does not contact the CA at any time
A physical TPM is not designed for 1000’s of VM’s to store their credentials. The “Non-Volatile Secure Storage” size is tiny in kilobytes.
How does a physical TPM work with vCenter?
When the host boots, the host loads UEFI which checks the Boot Loader and ESXi starts loading. VMKBoot communicates with TPM and information about the host is sent to vCenter to check everything is correct.
How does a vTPM work?
The specific use case for a vTPM on vSphere is to support Windows 10 and 2016 security features.
How do you add a vTPM?
You can add a vTPM to a virtual machine in the same way you add virtual CPUs, memory, disk controllers, or network controllers. A vTPM does not require a physical Trusted Platform Module (TPM) 2.0 chip to be present on the ESXi host. However, if you want to perform host attestation, an external entity, such as a TPM 2.0 physical chip, is required.
Note: If you have no KMS Server added to vCenter Server, even with a new virtual machine that has EFI and secure boot enabled, you will not see the option to add the Trusted Platform Module.
When added to a virtual machine, a vTPM enables the guest operating system to create and store keys that are private. These keys are not exposed to the guest operating system reducing the virtual machine’s attack surface. Enabling a vTPM greatly reduces this risk of compromising a guest O/S. These keys can be used only by the guest operating system for encryption or signing. With an attached vTPM, a third party can remotely attest to (validate) the identity of the firmware and the guest operating system.
You can add a vTPM to either a new virtual machine or an existing virtual machine. A vTPM depends on virtual machine encryption to secure vital TPM data. When you configure a vTPM, VM encryption automatically encrypts the virtual machine files but not the disks. You can choose to add encryption explicitly for the virtual machine and its disks.
You can also back up a virtual machine enabled with a vTPM. The backup must include all virtual machine data, including the *.nvram file which is the storage for the vTPM. If your backup does not include the *.nvram file, you cannot restore a virtual machine with a vTPM. Also, because the VM home files of a vTPM-enabled virtual machine are encrypted, ensure that the encryption keys are available at the time of a restore.
What files are encrypted and not encrypted?
The .nvram file
Parts of the VMX file
Swap, .vmss, .vmsn, namespacedb
DeployPackage (used by Guest Customization)
Log files are not encrypted.
Virtual machine requirements:
EFI firmware (Set in VM Settings > VM Options > Boot Options > Firmware
Hardware version 14
vCenter Server 6.7 or greater.
Virtual machine encryption (to encrypt the virtual machine home files).
Key Management Server (KMS) configured for vCenter Server (virtual machine encryption depends on KMS)
Windows Server 2016 (64 bit)
Windows 10 (64 bit)
Can you vMotion a machine with vTPM?
Yes, you can but Cross vCenter vMotion of an encrypted VM is not supported.
Does the host need a physical TPM to run a virtual TPM?
With vTPM, the physical host does not have to be equipped with a TPM module device. Everything is taken care of by the software by using the .nvram file to contain the contents of the vTPM hardware. The file is encrypted using virtual machine encryption and a KMS server.
Don't think about what can happen in a month. Don't think what can happen in a year. Just focus on the 24 hours in front of you and do what you can to get closer to where you want to be :-)