Archive for VMware

A dive into Host Profiles on vSphere 6.5

Host Profiles

As virtual infrastructures grow, it can become increasingly difficult and time consuming to configure multiple hosts in similar ways. Existing per-host processes typically involve repetitive and error-prone configuration steps. As a result, maintaining configuration consistency and correctness across the datacenter requires increasing amounts of time and expertise, leading to increased operational costs. Host Profiles eliminates per-host, manual or UI-based host configuration and maintains configuration consistency and correctness across the datacenter by using Host Profiles policies. These policies capture the blueprint of a known, validated reference host configuration, including the networking, storage, security and other settings.

You can then use this profile to:

• Automate host configuration across a large number of hosts and clusters. You can use Host Profiles to simplify the host provisioning process, configure multiple hosts in a similar way, and reduce the time spent on configuring and deploying new VMware ESX/ESXi hosts.

• Monitor for host configuration errors and deviations. You can use Host Profiles to monitor for host configuration changes, detect errors in host configuration, and ensure that the hosts are brought back into a compliant state. With Host Profiles, the time required to set up, change, audit and troubleshoot configurations drops dramatically due to centralized configuration and compliance checking. Not only does it reduce labor costs, but it also minimizes risk of downtime for applications/ virtual machines provisioned to misconfigured systems.

Accessing Host Profiles

Click Home > Host Profiles

You should see the below

What can we do with Host Profiles?

  1. Create a Host Profile
  2. Edit a Host Profile
  3. Extract a Host Profile from a host
  4. Attach a Host Profile to a host or cluster
  5. Check compliance
  6. Remediate a host
  7. Duplicate a Host Profile
  8. Copy settings from a host – If the configuration of the reference host changes, you can update the Host Profile so that it matches the reference host’s new configuration
  9. Import a Host Profile – .vpf
  10. Export a Host Profile – .vpf

Steps to create a profile

Host Profiles automates host configuration and ensures compliance in four steps: 1.

Step 1: Create a profile, using the designated reference host. To create a host profile, VMware vCenter Server retrieves and encapsulates the configuration settings of an existing VMware ESX/ESXi host into a description that can be used as a template for configuring other hosts. These settings are stored in the VMware vCenter Server database and can be exported into the VMware profile format (.vpf).

Step 2: Attach a profile to a host or cluster. After you create a host profile, you can attach it to a particular host or cluster. This enables you to compare the configuration of a host against the appropriate host profile.

Step 3: Check the host’s compliance against a profile. Once a host profile is created and attached with a set of hosts or clusters, VMware vCenter Server monitors the configuration settings of the attached entities and detects any deviations from the specified “golden” configuration encapsulated by the host profile.

Step 4: Apply the host profile of the reference host to other hosts or clusters of hosts. If there is a deviation, VMware vCenter Server determines the configuration that applies to a host. To bring noncompliant hosts back to the desired state, the VMware vCenter Server Agent applies a host profile by passing host configuration change commands to the VMware ESX/ESXi host agent through the vSphere API

Steps to create a host profile

  1. In the Host Profiles view, click Extract Profile from a host

2. You should get a wizard pop up. Choose the vCenter followed by the host you want to extract the profile from

3. Put in a name and description

4. Ready to Complete

5. A Host profile will be created and appear in the Host Profiles section

6. Edit the settings of the Host Profile by right clicking on the profile and click Edit Settings

7. The Edit Host Profile screen will pop up

8. Click Next to get to the Settings screen

9. When you edit the Host profile you can expand the Host profiles configuration hierarchy to see the sub profile components that comprise the Host profile. These components are categorised by functional group or resource class to make it easier to find a particular parameter. Each subprofile component contains one or more attributes and parameters, along with the policies and compliance checks

10. You can also mark settings as favourites by clicking the yellow star. you can then click View > Favourites to simplify searching for settings.

11. For example we have a default shared Datastore for storing logs under their own unique name. This saves us time configuring it manually

12. Note: There is an important setting if you are using a host profile with AutoDeploy. It will dictate how ESXi is installed and how the install will work on future reboots. vSphere has introduced new options described below for deploying hosts. I will be doing a further blog about AutoDeploy using these settings

Stateless Caching

Upon provisioning, the ESXi image is written or cached to a host’s server local (internal) or USB disk. The option is particularly useful when multiple ESXi hosts are being provisioned concurrently so rather than saturate the network, ESXi is re-provisioned from a cached image from a local or USB disk. Problems can occur such as the below though.

a) If the vCenter Server is available but the vSphere Auto Deploy server is unavailable, hosts do not connect to the vCenter Server system automatically. You can manually connect the hosts to the vCenter Server, or wait until the vSphere Auto Deploy server is available again.

b) If both vCenter Server and vSphere Auto Deploy are unavailable, you can connect to each ESXi host by using the VMware Host Client, and add virtual machines to each host.

c) If vCenter Server is not available, vSphere DRS does not work. The vSphere Auto Deploy server cannot add hosts to the vCenter Server. You can connect to each ESXi host by using the VMware Host Client, and add virtual machines to each host.

d) If you make changes to your setup while connectivity is lost, the changes are lost when the connection to the vSphere Auto Deploy server is restored.

Stateful Install

When the host first boots it will pull the image from the AutoDeploy server, then on all subsequent restarts the host will boot from the locally installed image, just as with a manually built host. With stateful installs, ensure that the host is set to boot from disk firstly, followed by network boot.

13. Once we have finished customising our profile, we can save it then we need to attach it to our hosts

14. Click the Attach/Detach Hosts and Clusters button within Host Profiles. A wizard will appear. I’m just going to test one of my hosts first and select attach. Keep Skip Host Customization unticked as we will see where we get any missing information which needs entering.

15. You will likely get some host customization errors as I did where I needed to fill in a DNS name of my host and add a username and password to join the hosts to the domain.

16. Next click on the button to check host compliance

17. I can see that one of my hosts is not compliant so I will see what I need to adjust

18. So I double check all my settings and find that yes, there is a mismatch in the config for esxupdate in the firewall config and there are different values between hosts for syslog settings. I’ll check and adjust these and run the Check Host Compliance again.

19. Lo and behold, I now have 3 compliant hosts 🙂

Reference Host setup for Autodeploy

A well-designed reference host connects to all services such as syslog, NTP, and so on. The reference host setup might also include security, storage, networking, and ESXi Dump Collector. You can apply such a host’s setup to other hosts by using host profiles.

The exact setup of your reference host depends on your environment, but you might consider the following customization.

NTP Server Setup

When you collect logging information in large environments, you must make sure that log times are coordinated. Set up the reference host to use the NTP server in your environment that all hosts can share. You can specify an NTP server by running the vicfg-ntp command. You can start and stop the NTP service for a host with the vicfg-ntp command, or the vSphere Web Client.

Edit the Host profile with the settings for your NTP service

Syslog Server Setup

All ESXi hosts run a syslog service (vmsyslogd), which logs messages from the VMkernel and other system components to a file. You can specify the log host and manage the log location, rotation, size, and other attributes by running the esxcli system syslog vCLI command or by using the vSphere Web Client. Setting up logging on a remote host is especially important for hosts provisioned with vSphere Auto Deploy that have no local storage. You can optionally install the vSphere Syslog Collector to collect logs from all hosts.

Edit the Host profile with the below 2 settings

Core Dump Setup

You can set up your reference host to send core dumps to a shared SAN LUN, or you can enable ESXi Dump Collector in the vCenter appliance and configure the reference host to use ESXi Dump Collector. After setup is complete, VMkernel memory is sent to the specified network server when the system encounters a critical failure.

Turn on the Dump Collector service in vCenter

Configure the host profile to enable and point the host to the vCenter on port 6500

Security Setup

In most deployments, all hosts that you provision with vSphere Auto Deploy must have the same security settings. You can, for example, set up the firewall to allow certain services to access the ESXi system, set up the security configuration, user configuration, and user group configuration for the reference host with the vSphere Web Client or with vCLI commands. Security setup includes shared user access settings for all hosts. You can achieve unified user access by setting up your reference host to use Active Directory. See the vSphere Securitydocumentation.

vCenter 6.5U2c VCHA Error

 

 

 

 

Random problem when setting up vCenter HA

So this is an interesting one because I don’t have a solution but it is now working so I can only explain what happened. The 3 blades hosting the vCenter were HPE Proliant BL460c Gen10 servers. Once I reached the end of configuring a 6.5U2c vCenter for vCenter HA, I received the following error message.

So after going back and double checking typos and distributed switch and port group settings, everything looked fine but this error as you can see specifically mentioned a host vmh01. So i decided to run the VCHA wizard again which produced the same error but listed the second host


As I had 3 hosts in the cluster, I decided to run the wizard a third time which errored on the third host but running it a fourth time meant the VCHA setup ran perfectly and finished without any problems. There was no problem with the vDS or port groups or general networking.

The great thing about VCHA is that in this instance, it rolls everything back so you can simply start again. You might ask why I havent taken a snapshot – well it doesn’t allow you to do this! The rollback works very well, in fact 3 times in this scenario. Obviously not so good if you have hundreds of hosts 😀 A very strange problem where the NICs seemed to need a push before deciding to work however it did work in the end.

 

VMware Network Port Diagram v6.x

 

 

 

 

This is a short post because I came across a really useful link to a PDF document showing how vCenter and ESXi are connected. This shows what ports and what direction the ports travel which is really useful to understand the internal and external communication with explanations for each port.

Link

Downloadable PDF within the KB

https://kb.vmware.com/s/article/2131180

 

Installing vCenter HA – 6.5U2c

 

 

 

 

 

Installing vCenter HA

The vCenter High Availability architecture uses a three-node cluster to provide availability against multiple types of hardware and software failures. A vCenter HA cluster consists of one Active node that serves client requests, one Passive node to take the role of Active node in the event of failure, and one quorum node called the Witness node. Any Active and Passive node-based architecture that supports automatic failover relies on a quorum or a tie-breaking entity to solve the classic split-brain problem, which refers to data/availability
inconsistencies due to network failures within distributed systems maintaining replicated data. Traditional architectures use some form of shared storage to solve the split-brain problem. However, in order to support a vCenter HA cluster spanning multiple datacenters, our design does not assume a shared storage–based deployment. As a result, one node in the vCenter HA cluster is permanently designated as a quorum node, or a Witness node. The other two nodes in the cluster dynamically assume the roles of Active and Passive nodes.
vCenter Server availability is assured as long as there are two nodes running inside a cluster. However, a cluster is considered to be running in a degraded state if there are only two nodes in it. A subsequent failure in a degraded cluster means vCenter services are no longer available.

A vCenter Server appliance is stateful and requires a strong, consistent state for it to work correctly. The appliance state (configuration state or runtime state) is mainly composed of:

• Database data (stored in the embedded PostgreSQL database)
• Flat files (for example, configuration files).

The appliance state must be backed up in order for VCHA failover to work properly. For the state to be stored inside the PostgreSQL database, we use the PostgreSQL native replication mechanism to keep the database data of the primary and secondary in sync. For flat files, a Linux native solution, rsync, is used for replication.
Because the vCenter Server appliance requires strong consistency, it is a strong requirement to utilize a synchronous form of replication to replicate the appliance state from the Active node to the Passive node

 

 

 

 

 

 

 

 

 

 

 

Installing vCenter HA

  • Download the relevant vCenter HA iso from the VMware download page
  • Mount the iso from a workstation or server

 

 

 

 

 

 

 

 

  • We’ll now go through the process of installing the first vCenter Server. I have mounted the iso on my Windows 10 machine
  • Go to vcsa-ui-installer > win32 > installer.exe

 

 

 

 

 

 

 

 

  • Click Install

 

 

 

 

 

 

 

 

 

 

 

 

  • Click Next

 

 

 

 

 

 

 

 

 

 

 

 

  • Click Accept License Agreement

 

 

 

 

 

 

 

 

 

 

 

 

  • Select Embedded Platform Services Controller. Note you can deploy an external PSC. I am doing the install this way as I want to test the embedded linked mode functionality now available in 6.5U2+ between embedded platform services controllers (This will require the build of another vCenter HA with an embedded PSC which I’ll try and cover in an another blog)

 

 

 

 

 

 

 

 

 

 

 

 

  • Next put in the details for a vCenter or host as the deployment target

 

 

 

 

 

 

 

 

 

 

 

 

  • Select the Certificate

 

 

 

 

 

 

 

 

 

 

 

 

  • Put in an appliance, username and password for the new vCenter appliance

 

 

 

 

 

 

 

 

 

 

  • Choose the deployment size and the Storage Size. Click Next

 

 

 

 

 

 

 

 

 

 

  • Choose the datastore to locate the vCenter on. Note: I am running vSAN.

 

 

 

 

 

 

 

 

 

 

  • Configure network settings. Note: As I chose a host to deploy to, it does not give me any existing vDS port groups. I have chosen to deploy to a host rather than an existing vCenter as I am testing this for a Greenfield build at work which does not have any existing vCenters etc to start with, just hosts.
  • Note: It would be useful at the point to make sure you have entered the new vCenter name and IP address into DNS.

 

 

 

 

 

 

 

 

 

 

  • Check all the details are correct

 

 

 

 

 

 

 

 

 

 

  • Click Finish. It should now say Initializing and start deploying

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • You should see the appliance is being deployed.

 

  • When the deployment has finished, you should see this screen.

  • You can carry on with Step 2 at this point but I closed the wizard at this point and I’m now going to log in to my vCenter and configure the appliance settings on https://techlabvca002.techlab.local:5480
  • Click Set up vCenter Server Appliance

 

 

 

 

 

 

 

 

 

  •  Log in to the vCenter

 

 

 

 

 

 

 

 

 

  • The below screen will pop up. Click Next

 

 

 

 

 

 

 

 

 

  • Check all details
  • Put in time servers. I’m connected to the internet through my environment so I use some generic time servers
  • Enable SSH if you need to – can be turned off again after configuration for security.

 

 

 

 

 

 

 

 

 

  • Put in your own SSO configuration
  • Click Next

 

 

 

 

 

 

 

 

 

  • Select or unselect the CEIP

 

 

 

 

 

 

 

 

 

  • Check all the details and click Finish

 

 

 

 

 

 

 

 

 

  • A message will pop up

 

 

 

 

 

 

 

 

 

  • The vCenter Appliance will begin the final installation

 

 

 

 

 

 

 

 

  • When complete, you should see the following screen

 

 

 

 

 

 

 

 

 

  • You can now connect to the vCenter Appliance on the 5480 port and the Web Client

 

 

 

 

 

 

  • Note: at this point I actually changed to enable VCHA on my normal first built vCenter called techlabvca001 as I should have added my second vCenter into the same SSO domain as techlabvca001 but I actually set it up as a completely different vCenter so it wouldn’t let me enable VCHA in the way I set it up. Log into the vSphere Web Client for techlabvca001
  • Highlight vCenter
  • Click the Configure tab
  • Choose Basic

  • Put in the Active vCenters HA address and subnet mask
  • Choose a port group

 

 

 

 

 

 

 

 

 

  • Click Next
  • Select Advanced and change the IP settings to what you want

 

 

 

 

 

 

 

 

 

  • Passive Node

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

  • And the Witness Node

 

  • Click Next and you will be on the next screen which allows you to specify what location and datastores you can use to place the nodes

 

 

 

 

 

 

 

 

 

  • Click Edit on the Passive Node

 

 

 

 

 

 

 

 

 

  • Select the Compute Resource

 

 

 

 

 

 

 

 

 

  • Choose a datastore – In my case this will be my vSAN

 

 

 

 

 

 

 

 

 

  • Check the Compatibilty checks – In my case it is just notifying me about snapshots being lost when this created.

  • Next adjust the Witness settings – I am not going to go through them all again as they will be the same as the Passive node we just did.

 

 

 

 

 

 

 

 

 

  • Check the Management network and vCenter HA networks

 

 

 

 

 

 

 

 

 

  • Next and check the final details and click Finish

 

 

 

 

 

 

 

 

 

  • It will now say vCenter HA being deployed in the vSphere Web client

 

 

 

 

 

 

 

 

 

  • You should see a Peer machine and a Witness machine being deployed

 

 

 

 

 

 

 

 

  • Once complete you will see VCHA is enabled and you should see your Active vCenter, Passive vCenter and Witness

 

 

 

 

 

 

  • Click the Test Failover to check everything is working as expected

 

 

 

 

 

 

 

  • You can also place the HA Cluster in several modes

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Updating SSL Certificates on vCenter and Platform Services Controllers

vCenter services use SSL to communicate securely with each other and with ESXi. SSL communications ensure data confidentiality and integrity. Data is protected and cannot be modified in transit without detection.

vCenter Server services such as the vSphere Web Client also use certificates for initial authentication to vCenter Single Sign-On. vCenter Single Sign-On provisions each set of services (solution user) with a SAML token that the solution user can authenticate with.

In vSphere 6.0 and later, the VMware Certificate Authority (VMCA) provisions each ESXi host and each vCenter Server service with a certificate that is signed by VMCA by default.

You can replace the existing certificates with new VMCA-signed certificates, make VMCA a subordinate CA, or replace all certificates with custom certificates. You have several options:

Requirements for imported certificates

  • Key size: 2048 bits or more (PEM encoded)
  • PEM format. VMware supports PKCS8 and PKCS1 (RSA keys). When you add keys to VECS, they are converted to PKCS8.
  • x509 version 3
  • SubjectAltName must contain DNS Name=machine_FQDN
  • CRT format
  • Contains the following Key Usages: Digital Signature, Key Encipherment.
  • Client Authentication and Server Authentication cannot be present under Enhanced Key Usage

VMCA does not support the following certificates

  1. Certificates with wildcards
  2. The algorithms md2WithRSAEncryption 1.2.840.113549.1.1.2, md5WithRSAEncryption 1.2.840.113549.1.1.4, and sha1WithRSAEncryption 1.2.840.113549.1.1.5 are not recommended.
  3. The algorithm RSASSA-PSS with OID 1.2.840.113549.1.1.10 is not supported.

The work required for setting up or updating your certificate infrastructure depends on the requirements in your environment, on whether you are performing a fresh install or an upgrade, and on whether you are considering ESXi or vCenter Server.

What is the VMCA?

The VMware Certificate Authority (VMCA) is the default root certificate authority introduced in vSphere 6.0 that supplies the certificates to ensure communication over SSL between vCenter Server components and ESXi nodes in the virtualized infrastructure.

The VMCA is included in the Platform Services Controller and provides certificates for

  • Solution users (Replacing Solution user certificates is not normally required by company policy)
  • Machines that have running services
  • ESXi hosts. An ESXi host gets a signed certificate, stored locally on the server, from the VMCA. For environments that require a different root authority, an administrator must change the option in vCenter to stop automatically provisioning VMCA certificates to hosts.

If you do not currently replace VMware certificates, your environment starts using VMCA-signed certificates instead of self-signed certificates.

What is the VECS?

VMware Endpoint Certificate Store (VECS) serves as a local (client-side) repository for certificates, private keys, and other certificate information that can be stored in a keystore. You can decide not to use VMCA as your certificate authority and certificate signer, but you must use VECS to store all vCenter certificates, keys, and so on. ESXi certificates are stored locally on each host and not in VECS. VECS runs on every embedded deployment, Platform Services Controller node, and management node and holds the keystores that contain the certificates and keys.

How does VMCA deal with certificates?

With VMCA you can deal with certificates in three different ways.

  • VMCA Default

VMCA uses a self-signed root certificate. It issues certificates to vCenter, ESXi, etc and manages these certificates. These certificates have a chain of trust that stops at the VMCA root certificate. VMCA is not a general-purpose CA and its use is limited to VMware components.

  • VMCA Enterprise

VMCA is used as a subordinate CA and is issued subordinate CA signing certificate. It can now issue certificates that trust up to the enterprise CA’s root certificate. If you have already issued certs using VMCA Default and replace VMCA’s root cert with a CA signing cert then all certificates issued will be regenerated and pushed out to the components

  • Custom

In this scenario VMCA is completely bypassed. This scenario is for those customers that want to issue and/or install their own certificates from their own internal PKI or third party signed certificates generated from an external PKi such as Verisign or GoDaddy. You will need to issue a cert for every component. All those certs (except for host certs) need to be installed into VECS.

In Default and Enterprise modes, VMCA certificates can be easily regenerated on demand.

Certificate Manager Tool

For vSphere 6, the procedure for installing certificates has changed. A new Certificate Manager Tool is shipped as part of vCenter for Windows and VCSA. The location is below

/usr/lib/vmware-vmca/bin/certificate-manager

Deployments

I’m going to use a custom deployment method to just change machine certs but not ESXi host or Solution certificates.

Hybrid Deployment

You can have VMCA supply some of the certificates, but use custom certificates for other parts of your infrastructure. For example, because solution user certificates are used only to authenticate to vCenter Single Sign-On, consider having VMCA provision those certificates. Replace the machine SSL certificates with custom certificates to secure all SSL traffic.

Company policy often does not allow intermediate CAs. For those cases, hybrid deployment is a good solution. It minimizes the number of certificates to replace and secures all traffic. The hybrid deployment leaves only internal traffic, that is, solution user traffic, to use the default VMCA-signed certificates

Where vSphere uses certificates

ESXi Certificates

  • Stored locally on each host in the etc/vmware/ssl directory
  • ESXi certificates are provisioned by VMCA by default when the host is first added to vCenter and the host reconnects, but you can use custom certificates instead

Machine SSL Certificates

  • The machine SSL certificate for each node is used to create an SSl socket on the server side
  • SSL clients connect to the SSL socket
  • Used for server verification and for secure communications such as HTTPS or LDAPS
  • Each node has its own machine SSL certificate. Nodes include vCenter, Platform Services Controller or embedded deployment instance
  • VMware products use standard X.509 version 3 (X.509v3) certificates to encrypt session information. Session information is sent over SSL between components.

The following services use the machine SSL certificate

  • The reverse proxy service on each Platform Services Controller node. SSL connections to individual vCenter services always go to the reverse proxy. Traffic does not go to the services themselves.
  • The vCenter service (vpxd) on management nodes and embedded nodes.
  • The VMware Directory Service (vmdir) on infrastructure nodes and embedded nodes.

Solution User Certificates

  • A solution user encapsulates one or more vCenter Server services. Each solution user must be authenticated to vCenter Single Sign-On. Solution users use certificates to authenticate to vCenter Single

Sign-On through SAML token exchange

  • A solution user presents the certificate to vCenter Single Sign-On when it first must authenticate, after a reboot, and after a timeout has elapsed. The timeout (Holder-of-Key Timeout) can be set from the vSphere Web Client or Platform Services Controller Web interface and defaults to 2592000 seconds (30 days).

The following solution user certificate stores are included in VECS on each management or embedded node.

Managing certificates with the vSphere Certificate Manager Utility

There are a few ways of managing certificates but I am going to run through the vSphere Certificate Manager Utility.

The vSphere Certificate Manager utility allows you to perform most certificate management tasks interactively from the command line. vSphere Certificate Manager prompts you for the task to perform, for certificate locations and other information as needed, and then stops and starts services and replaces certificates for you.

If you use vSphere Certificate Manager, you are not responsible for placing the certificates in VECS (VMware Endpoint Certificate Store) and you are not responsible for starting and stopping services.

Before you run vSphere Certificate Manager, be sure you understand the replacement process and procure the certificates that you want to use

Certificate Manager Utility Location

Procedure

  • First of all I need to create a template in my own internal Certificate Authority. I’m going to follow the below article with the steps and screenprints to show what I’m doing.

https://kb.vmware.com/s/article/211200

  1. Connecting to the CA server, you will be generating the certificates from through an RDP session.
  2. Click Start > Run, type certtmpl.msc, and click OK.
  3. In the Certificate Template Console, under Template Display Name, right-click Web Server and click Duplicate Template.

  • In the Duplicate Template window, select Windows Server 2003 Enterprise for backward compatibility.Note: If you have an encryption level higher than SHA1, select Windows Server 2008 Enterprise.

  • Click the General Tab
  • In the Template display name field, enter vSphere 6.x as the name of the new template

  1. Click the Extensions tab.
  2. Select Application Policies and click Edit.
  3. Select Server Authentication and click Remove, then OK.
  4. Note: If Client Authentication exists, remove this from Application Policies as well.

  • Select Key Usage and click Edit
  • Select the Signature is proof of origin (nonrepudiation) option. Leave all other options as default
  • Click OK.

  • Click on the the Subject Name tab.
  • Ensure that the Supply in the request option is selected.
  • Click OK to save the template.

Next: Adding a new template to certificate templates section in the article to make the newly created certificate template available

  • Connecting to the CA server, you will be generating the certificates from through an RDP session.
  • Click Start > Run, type certsrv.msc, and click OK.
  • In the left pane of the Certificate Console, if collapsed, expand the node by clicking the + icon.

  • Right-click Certificate Templates and click New > Certificate Template to Issue.
  • Locate vSphere 6.x or vSphere 6.x VMCA under the Name column.
  • Click OK.

  • You will then see the certificate template

Next: Create a folder on the VCSA for uploading and downloading certificates

  • WinSCP into the VCSAs/PSCs and create a folder that you can upload and download to. E.g. /tmp/machine_ssl

shell.set –enabled True

shell

chsh -s /bin/bash root

Generate the Certificate signing request

Note: If you have external PSCs then do these first before doing the Centers.

The Machine SSL certificate is the certificate you get when you open the vSphere Web Client in a web browser. It is used by the reverse proxy service on every management node, Platform Services Controller, and embedded deployment. You can replace the certificate on each node with a custom certificate.

  1. In Putty, Navigate to /usr/lib/vmware-vmca/bin/ and run ./certificate-manager
  2. Put in the administrator@vsphere.local account and password

  1. Select Option 1 to Replace Machine SSL certificate with Custom Certificate
  2. Put in the path to the /tmp/machine_ssl folder on the appliance

  • Enter all the relevant cert info
    • Output directory path: path where will be generated the private key and the request
    • Country: your country in two letters
    • Name: The FQDN of your vCSA
    • Organization: an organization name
    • OrgUnit: type the name of your unit
    • State: country name
    • Locality: your city
    • IPAddess: provide the vCSA IP address
    • Email: provide your E-mail address
    • Hostname: the FQDN of your vCSA
    • VMCA Name: the FQDN where is located your VMCA. Usually the vCSA FQDN

  • You will then see the generated csr and key in the /tmp/machine_ssl folder

  • Open the vmware_issued_csr.csr file and copy the contents

Next: Request a certificate from your CA.

The next step is to use the CSR to request a certificate from your internal Certificate Authority.

  1. Log in to the Microsoft CA certificate authority Web interface. By default, it is http://CA_Server_FQDN/CertSrv.
  2. Click the Request a certificate (.csr ) link.

  1. Click advanced certificate request.

  1. Click the Submit a certificate request by using a base-64-encoded CMC or PKCS #10 file, or submit a renewal request by using a base-64-encoded PKCS #7 file link.

  1. Open the certificate request (machine_ssl.csr) in a plain text editor and copy from —–BEGIN CERTIFICATE REQUEST—– to —–END CERTIFICATE REQUEST—– into the Saved Request box.

  • On the download page, Select “Base 64 encoded” and click on “Download Certificate”. The downloaded file will be called “certnew.cer”. Rename this to “machine_ssl.cer”
  • Go back to the download web page and click on “Download certificate chain” (ensuring that “Base 64 encoded” is still selected). The downloaded file will be called “certnew.pb7”. Rename this to “cachain.pb7”

  • We are now going to export the CA Root certificate from the cachain.pb7 files. Right-click on the cachain.pb7 file and select “Open”

  • Expand the list and click on the Certificates folder. Right-click on the CA root cert (techlab-TECHLABDCA001-CA in this example), select All Tasks…Export

Select Base 64 encoded

  • Save the file as root-64.cer

  • You should now have the machine_ssl.cer file and the root-64.cer file
  • Using WinSCP copy the machine_ssl.cer and root-64.cer certificate files to the VCSA.

  • Now that the files have been copied, open the Certificate Manager Utility and select Option 1, Replace Machine SSL certificate with Custom Certificate.

  • Provide the password to your administrator@vsphere.local account and select Option 2, “Import Custom Certificate(s) and key(s) to replace existing Machine SSL certificate”

  • You will be prompted for following files:
  • machine_ssl.cer
  • machine_ssl.key
  • root-64.cer

  • Type Y to begin the process
  • It will kick of the install

  • You should get a message to say that everything is completed

  • Now check to see if everything has gone to plan. One thing to remember before we start. Because the new Machine SSL cert has been issued by the CA on the domain controller, you may need to install the root-64.cer fle into the browser. Once done, close the browser and log into the vSphere Web Client.
  • Now open your vCenter login page and check the certificate being used to protect it

  • Now open your vCenter login page and check the certificate being used to protect it
  • You’ll see that the certificate has been verified by “techlab-TECHLABADC001-CA”. This is the CA running on the Windows domain controller.

vSAN Stretched Cluster networking

Image result for free black and white storage icon

vSAN Stretched Cluster networking

A vSAN Stretched Cluster is a specific configuration implemented in environments where disaster/downtime avoidance is a key requirement. Setting up a stretched cluster can be daunting. More in terms of the networking side than anything else. This blog isn’t meant to be chapter and verse on vSAN stretched clusters. It is meant to help anyone who is setting up the networking, static routes and ports required for a L2 and L3 implementation.

VMware vSAN Stretched Clusters with a Witness Host refers to a deployment where a user sets up a vSAN cluster with 2 active/active sites with an identical number of ESXi hosts distributed evenly between the two sites. The sites are connected via a high bandwidth/low latency link.

The third site hosting the vSAN Witness Host is connected to both of the active/active data-sites. This connectivity can be via low bandwidth/high latency links.

Each site is configured as a vSAN Fault Domain. The way to describe a vSAN Stretched Cluster configuration is X+Y+Z, where X is the number of ESXi hosts at data site A, Y is the number of ESXi hosts at data site B, and Z is the number of witness hosts at site C. Data sites are where virtual machines are deployed. The minimum supported configuration is 1+1+1(3 nodes). The maximum configuration is 15+15+1 (31 nodes). In vSAN Stretched Clusters, there is only one witness host in any configuration.

A virtual machine deployed on a vSAN Stretched Cluster will have one copy of its data on site A, a second copy of its data on site B and any witness components placed on the witness host in site C.

Types of networks

VMware recommends the following network types for Virtual SAN Stretched Cluster:

  • Management network: L2 stretched or L3 (routed) between all sites. Either option should both work fine. The choice is left up to the customer.
  • VM network: VMware recommends L2 stretched between data sites. In the event of a failure, the VMs will not require a new IP to work on the remote site
  • vMotion network: L2 stretched or L3 (routed) between data sites should both work fine. The choice is left up to the customer.
  • Virtual SAN network: VMware recommends L2 stretched between the two data sites and L3 (routed) network between the data sites and the witness site.

The major consideration when implementing this configuration is that each ESXi host comes with a default TCPIP stack, and as a result, only has a single default gateway. The default route is typically associated with the management network TCPIP stack. The solution to this issue is to use static routes. This allows an administrator to define a new routing entry indicating which path should be followed to reach a particular network. Static routes are needed between the data hosts and the witness host for the VSAN network, but they are not required for the data hosts on different sites to communicate to each other over the VSAN network. However, in the case of stretched clusters, it might also be necessary to add a static route from the vCenter server to reach the management network of the witness ESXi host if it is not routable, and similarly a static route may need to be added to the ESXi witness management network to reach the vCenter server. This is because the vCenter server will route all traffic via the default gateway.

vSAN Stretched Cluster Visio diagram

The below diagram is for referring to and below this, the static routes are listed so it is clear what needs to connect.

Static Routes

The recommended static routes are

  • Hosts on the Preferred Site have a static route added so that requests to reach the witness network on the Witness Site are routed out the vSAN VMkernel interface
  • Hosts on the Secondary Site have a static route added so that requests to reach the witness network on the Witness Site are routed out the vSAN VMkernel interface
  • The Witness Host on the Witness Site have static route added so that requests to reach the Preferred Site and Secondary Site are routed out the WitnessPg VMkernel interface

On each host on the Preferred and Secondary site

These were the manual routes added

  • esxcli network ip route ipv4 add -n 192.168.1.0/24-n vmk1 -g 172.31.216.1  (192.168.1.0 being the witness vsan network and 172.31.216.1 being the host vsan vmkernel address)
  • esxcli network ip route ipv4 list will show you the networking
  • vmkping -I vmk1 192.168.1.10 will confirm via ping that the network is reachable

On the witness

These were the manual routes added

  • esxcli network ip route ipv4 add -n 172.31.216.0/25 -n vmk1 -g 192.168.1.1 (172.31.216.0/25 being the host vsan vmkernel network and the gateway being the witness vsan vmkernel gateway)
  • esxcli network ip route ipv4 list will show you the networking
  • vmkping -I vmk1 172.31.216.10 will confirm via ping that the network is reachable

Port Requirements

Virtual SAN Clustering Service

12345, 23451 (UDP)

Virtual SAN Cluster Monitoring and Membership Directory Service. uses UDP-based IP multicast to establish cluster members and distribute Virtual SAN metadata to all cluster members. If disabled Virtual SAN does not work,

Virtual SAN Transport

2233 (TCP)

Virtual SAN reliable datagram transport. uses TCP and is used for Virtual SAN storage I/O. if disabled, Virtual SAN does not work 

vSANVP

8080 (TCP)

vSAN VASA Vendor Provider. Used by the Storage Management Service (SMS) that is part of vCenter to access information about Virtual SAN storage profiles, capabilities and compliance. If disabled, Virtual SAN Storage Profile Based Management does not work

Virtual SAN Unicast agent to witness 

12321 (UDP)

Self explanatory as needed for unicast from data nodes to witness.

vSAN Storage Hub

The link below is to the VMware Storage Hub which is the central location for all things vSAN including the vSAN stretched cluster guide which is exportable to PDF. Page 66/67 are relevant to networking/static routes.

https://storagehub.vmware.com/t/vmware-vsan/vsan-stretched-cluster-2-node-guide/network-design-considerations/

What’s going on with VMware Transparent Page Sharing?

 

 

 

 

 

What is Transparent Page Sharing?

When multiple virtual machines are running, some of them may have identical sets of memory content. This presents opportunities for sharing memory across virtual machines (as well as sharing within a single virtual machine). For example, several virtual machines may be running the same guest operating system, have the same applications, or contain the same user data. With page sharing, the hypervisor can reclaim the redundant copies and keep only one copy, which is shared by multiple virtual machines in the host physical memory. As a result, the total virtual machine host memory consumption is reduced and a higher level of memory overcommitment is possible.

What is the security problem related to Transparent Page Sharing currently?

There has been recent academic research that leverages Transparent Page Sharing (TPS) to gain unauthorized access to data under certain highly controlled conditions and documents VMware’s precautionary measure of restricting TPS to individual virtual machines by default in upcoming ESXi releases. At this time, VMware believes that the published information disclosure due to TPS between virtual machines is impractical in a real world deployment.

Published academic papers have demonstrated that by forcing a flush and reload of cache memory, it is possible to measure memory timings to try and determine an AES encryption key in use on another virtual machine running on the same physical processor of the host server if Transparent Page Sharing is enabled between the two virtual machines. This technique works only in a highly controlled system configured in a non-standard way that VMware believes would not be recreated in a production environment. .

Even though VMware believes information being disclosed in real world conditions is unrealistic, out of an abundance of caution upcoming ESXi Update releases will no longer enable TPS between Virtual Machines by default (Inter-VM TPS). TPS will still be utilized within individual VMs. (Intra-VM TPS)

What is meant by Intra-VM and Inter-VM in the context of Transparent Page Sharing?

  • Intra-VM means that TPS will de-duplicate identical pages of memory within a virtual machine, but will not share the pages with any other virtual machines.
  • Inter-VM mean that TPS will de-duplicate identical pages of memory within a virtual machine and will also share the duplicates with one of more other virtual machines with the same content.

VMware will disable the ability to share memory pages “between” virtual machines (Inter-VM Transparent Page Sharing) by default (in ESXi 5.0/5.1 and 5.5) in coming updates and the next major ESXi release and inter-Virtual Machine TPS is not enabled by default as of ESXi 6.0. Administrators may revert to the previous behavior if they so wish.

What could potentially be the effect?

Disabling inter-Virtual Machine TPS may impact performance in environments that rely heavily on memory over-commitment although we still have memory resource management techniques such as

  • Ballooning – Reclaims memory by artificially increasing the memory pressure inside the guest
  • Hypervisor/Host swapping – Reclaims memory by having ESX directly swap out the virtual machine’s memory
  • Memory Compression – Reclaims memory by compressing the pages that need to be swapped out

Please keep reading KB52337 for further information

So what options do we have?

The concept of salting has been introduced to help address concerns system administrators may have over the security implications of TPS. Salting is used to allow more granular management of the virtual machines participating in TPS than was previously possible. As per the original TPS implementation, multiple virtual machines could share pages when the contents of the pages were same. With the new salting settings, the virtual machines can share pages only if the salt value and contents of the pages are identical. A new host config option Mem.ShareForceSalting is introduced to enable or disable salting.

By default, salting is enabled after the ESXi update releases mentioned above are deployed, (Mem.ShareForceSalting=2) and each virtual machine has a different salt. This means page sharing does not occur across the virtual machines (inter-VM TPS) and only happens inside a virtual machine (intra VM).

When salting is enabled (Mem.ShareForceSalting=1 or 2) in order to share a page between two virtual machines both salt and the content of the page must be same. A salt value is a configurable vmx option for each virtual machine. You can manually specify the salt values in the virtual machine’s vmx file with the new vmx option sched.mem.pshare.salt. If this option is not present in the virtual machine’s vmx file, then the value of vc.uuid vmx option is taken as the default value. Since the vc.uuid is unique to each virtual machine, by default TPS happens only among the pages belonging to a particular virtual machine (Intra-VM).
If a group of virtual machines are considered trustworthy, it is possible to share pages among them by setting a common salt value for all those virtual machines (inter-VM).

The following table shows how different settings for TPS are used together to affect how TPS operates for individual virtual machines: 

What is the default behavior of Transparent Page Sharing in above mentioned Update releases?

By default, the setting is (Mem.ShareForceSalting=2) and each virtual machine has a different salt (that is sched.mem.pshare.salt is not present) which means that only Intra-VM page sharing is enabled. This behavior is new as per these ESXi update releases and page sharing will not happen across the virtual machines (inter-VM TPS) by default. 

How can I enable or disable salting? 

  1. Log in to ESX (i)/vCenter with the VI-Client.
  2. Select ESX (i) relevant host.
  3. In the Configuration tab, click Advanced Settings (link) under the software section.
  4. In the Advanced Settings window, click Mem.
  5. Search for Mem.ShareForceSalting and set the value to 1 or 2 (enable salting), 0 (disable salting).
  6. Click OK.
  7. For the changes to take effect do either of the two:
    • Migrate all the virtual machines to another host in cluster and then back to original host. Or
    • Shutdown and power-on the virtual machines.

How can I allow inter-VM TPS between two or more virtual machines?

Inter-VM TPS is enabled for two or more virtual machines by enabling salting and by giving them the same salt value.

How can I specify salt value of a virtual machine?

  1. Power off the virtual machine on which you want to set salt value.
  2. Right click on virtual machine, click on Edit settings.
  3. Select options menu, click on General under Advanced section.
  4. Click on Configuration Parameters
  5. Click on Add Row, new row will be added.
  6. On LHS add text sched.mem.pshare.salt and on RHS specify the unique string.
  7. Power on the virtual machine to take effect of salting.
  8. Repeat steps 1 to 7 to set the salt value for individuals virtual machine.

What is the difference in behavior of page sharing when MEM_SHARE_FORCE_SALTING value is set to 1 and 2?

MEM_SHARE_FORCE_SALTING 1: By default salt value is taken from sched.mem.pshare.salt. If not specified, falls back to old TPS (inter-VM) behavior by considering salt values for the virtual machine as 0.

MEM_SHARE_FORCE_SALTING 2: By default salt value is taken from vc.uuidz. If it does not exist, then the page sharing algorithm generates random and unique value for salting per virtual machine, which is not configurable by users.

How can I prepare for the ESXi Update releases that no longer allow inter-VM TPS by default?

VMware recommends you to monitor free memory available on the host along with the total ballooned and total swapped memory before deploying the ESXi update releases listed above that disallow inter-VM TPS. Once inter-VM TPS is disallowed, available free memory might drop which further can lead to increased ballooning and swapping. If increased ballooning and swapping activity is observed along with noticeable performance issues, more physical memory can be added on the host or the memory load on the host can be reduced.
To monitor the stats – Run esxtop command:

  • Run esxtop on host, click to switch to memory mode.
  • free from PMEM /MB row displays the free memory available on the host.
  • curr from MEMCTL/MB row displays the total ballooned memory.
  • curr from SWAP/MB row displays the total swapped memory.

How can I enable or disable salting for multiple ESXi hosts?

To enable or disable salting for multiple ESXi hosts. Refer to the attached powercli script in KB2097593

. This script allows toggling pshare salting for update releases.

Usage

.\pshare-salting.ps1 <vcenter IP/hostname> -s -> Enables pshare salting.
.\pshare-salting.ps1 <vcenter IP/hostname> -o -> Turn offs pshare salting and falls back to default TPS behaviour

Links

KB2080735

KB2097593

Are there any tools we are able to use to compare TPS savings before and after disabling Inter-VM transparent page sharing?

There is a PowerShell script (VMware recommended) called the “Host Memory Assessment Tool” to look at shared memory per host, and report it in a tabular form, so you can easily review the current shared memory savings, and the worst case impact in contrast with the free memory on the host. The script uses plink.exe to remotely SSH into each ESXi host and record memory counters using vsish. There is very low risk and impact to the ESXi hosts as it is a read only process

https://www.brianjgraf.com/2015/04/03/assess-impact-tps-vsphere-6/

What the script does:

  • Connects to vCenter and enumerates all ESXi hosts
  • Allows you to enable SSH on selected hosts
  • Generates an assessment report
  • Allows you to export the assessment report to .csv
  • Allows you to easily turn off SSH again if necessary

This tool would need to be run on an normal existing system with workloads with TPS on and Off to see the different outputs.

https://www.brianjgraf.com/2015/04/03/assess-impact-tps-vsphere-6/

VMware Update (20/03/2018)

The last update and updates going forward on performance impact associated with applying the security patches are now found in https://kb.vmware.com/s/article/52337

Virtualization Layer Mitigations: The latest ESXi patches** and the relevant Intel CPU microcode but without Guest Operating System mitigation patches. These mitigations have a minimal performance impact (< 2%) for most workloads on a representative range of recent Intel Xeon server processors.

Full Stack Mitigations: All levels of mitigation. This includes all virtualization layer mitigations above with the addition of Guest Operating System mitigation patches. As reported in the press, the impact of these mitigations will vary depending on your application. Applications with very heavy system call usage, including those with very high IO rates, will show a more significant impact than their counterparts with lower system call usage. For information regarding the performance impact of Operating System Mitigations on your application, please consult with your Operating system and/or Application vendor. Consistent with our findings above, the virtualization layer mitigations that are part of these full stack mitigations have minimal influence to the overall impact. As a general best practice, we recommend you test the appropriate patches with your applications prior to deploying in production environments.

Recap on Cluster Admission Control in vSphere 6.5/6.5U1

 

 

 

 

Cluster Admission Control in vSphere 6.5/6.5U1

vSphere HA uses admission control to ensure that sufficient resources are reserved for virtual machine recovery when a host fails. The basis for vSphere HA admission control is how many host failures your cluster is allowed to tolerate and still guarantee failover for the VMs on to the remaining hosts. The default Admission Control policy has changed from Slot Policy (Default until 6.5), to ‘Cluster Resource Percentage’. VMware had found that very few people were actually using slot policy, and those that were, were not using it correctly and it also involved some manual calculations when hosts were added or removed.

Admission control imposes constraints on resource usage. Any action that might violate these constraints is not permitted. Actions that might be disallowed include the following examples:

  • Powering on a virtual machine
  • Migrating a virtual machine
  • Increasing the CPU or memory reservation of a virtual machine

Computing the Current Failover Capacity

The total resource requirements for the powered-on virtual machines is comprised of two components, CPU and memory. vSphere HA calculates these values.

  • The CPU component by summing the CPU reservations of the powered-on virtual machines. If you have not specified a CPU reservation for a virtual machine, it is assigned a default value of 32MHz (this value can be changed using the das.vmcpuminmhz advanced option.)
  • The memory component by summing the memory reservation (plus memory overhead) of each powered-on virtual machine.

The total host resources available for virtual machines is calculated by adding the hosts’ CPU and memory resources. These amounts are those contained in the host’s root resource pool, not the total physical resources of the host. Resources being used for virtualization purposes are not included. Only hosts that are connected, not in maintenance mode, and have no vSphere HA errors are considered.

The Current CPU Failover Capacity is computed by subtracting the total CPU resource requirements from the total host CPU resources and dividing the result by the total host CPU resources.

The Current Memory Failover Capacity is computed by subtracting the total Memory resource requirements from the total host memory resources and dividing the result by the total host Memory resources

Host Failure Cluster Tolerates 

This option allows you to define the number of ESXi hosts tolerate for failures. vSphere HA will automatically calculate a percentage of resources to set by applying the “Percentage of Cluster Resources” (Default option in vSphere 6.5) admission control policy. Resources required for failover capacity is now directly related to the Host failures cluster tolerates option. In the example below, there are 2 ESXi hosts in the cluster and I have configured “Host failures cluster tolerates” value as “1”. HA will then automatically reserve 50% of Memory and 50% of CPU for the failover capacity. If you have a 4-host cluster and FTT=1 then it will calculate a 25% reservation. HA Slot policy used to be the default admission control policy. With vSphere 6.5, the default admission control policy is now “Cluster resource Percentage”.

If you have add or remove ESXi hosts in the cluster, the percentage of failover capacity will be automatically recalculated.

You have the option to Override the failover capacity by the number of host failures cluster tolerates by selecting the Override option and specify % for CPU and Memory.

 

Define host failover capacity by HA Slot Policy

You can also have option to choose “Slot Policy”.  This is the default option prior to vSphere 6.5. Slot Size is defined as the memory and CPU resources that satisfy the reservation requirements for any powered-on virtual machines in the HA cluster. You have 2 options under Slot Policy:

  • Cover All powered-on Virtual Machines

It calculates the slot size based on the maximum CPU/Memory reservation and overhead of all powered-on virtual machines in the cluster but will be skewed by reservations on VMs which is why we have the setting below to override calculations being based on large reservations.

  • Fixed Slot Size

You can explicitly specify the fixed slot size

 

Define host failover capacity by Dedicated Failover Hosts

This option allows you to define a dedicated ESXi host in the cluster as failover hosts for the HA cluster.  That dedicated failover host will not run virtual machines unless vSphere HA needs to recover from a failed host. However this is a waste of a whole host and not generally used unless you have the ability and capacity in your datacenter to keep a spare host aside just in case.

 

VM resource reduction event threshold – Performance Degradation Tolerance

The reserved capacity used by admission control ensures that all configured reservations will continue to be honored after a host failure.  If the reservations are not used in some environments, the performance of the cluster could be impacted.  This new setting called “VM resource reduction event threshold” defines how much of a performance impact is tolerated and will issue a warning if the consumed resources are higher than the reserved resources.

0% – Raises a warning if there is insufficient failover capacity to guarantee the same performance after VM’s restart.

100% -Warning is disabled.

 

Example

400GB of memory available in a 4 node cluster
1 host failure to tolerate specifed
310GB of memory actively used by VMs
0% resource reduction tolerated

This results in the following:
400GB – 100GB (1 host worth of memory) = 300GB
We have 310GB of memory actively used, with 0% resource reduction to tolerate
310GB needed, 300GB available after failure > A warning will be issued.

Summary

Just some general points I’ve seen lately. I’ll update them if anything else comes up which is interesting.

  • In terms of larger VMs skewing the options, any CPU or Memory VM reservation will be taken into account including the defaults taken for any VMs which do not have reservations.
  • In regards to reservations, they will only come into play when the system is under contention anyway. VMware is very good at managing resources. However if you do want to use them, I would advise monitoring the peak usage of CPU and RAM of the VMs you want a reservation on first so you can assign an accurate reservation.
  • Slot Policy is no longer the default option and is easy to set incorrectly so be careful with this option. You can create extra work for yourself but it is useful.
  • Use the Host failures to tolerate policy (the recommended policy now)
  • It may be worth setting the VM resource reduction event threshold so you are warned of any potential performance problems. Setting it to 0% will generate a warning if Admission Control thinks there is insufficient failover capacity to ensure the same performance after VMs are failed over. This is achieved through monitoring of actual usage of CPU and RAM resources. In conjunction with the Host Failures to Tolerate policy which just calculates using reservations, default cpu/ram values and overhead for its calculations, these 2 settings combined give a very useful monitoring aspect for your cluster.

VMware Ruby Virtual Console for vSAN 6.6

VMware Ruby Virtual Console for vSAN 6.6

The Ruby vSphere Console (RVC) is an interactive command-line console user interface for VMware vSphere and Virtual Center.

The Ruby vSphere Console comes bundled with both the vCenter Server Appliance (VCSA) and the Windows version of vCenter Server. RVC is quickly becoming one of the primary tools for managing and troubleshooting Virtual SAN environments

How to begin

  • To begin using the Ruby vSphere Console to manage your vSphere infrastructure, deploy the vCenter Server Appliance and configure network connectivity for the appliance.
  • Afterwards, SSH using Putty or an app you prefer to the dedicated vCenter Server Appliance and login as a privileged user. No additional configuration is required to begin.
  • Commands such as ‘cd’ and ‘ls’ work fine and if you want to return to the previous directory type ‘cd .. and press Enter

How to Login

RVC credentials are directly related to the default domain setting in SSO (Single Sign-On). Verify the default SSO Identity Source is set to the desired entity.

So there are a few different ways to logon potentially either locally or with domain credentials. Examples below

  • rvc administrator@vsphere.local@localhost
  • rvc root@localhost
  • rvc administrator@techlab.local@localhost

Where to go from here

You are now at the root of the virtual filesystem.

  • To access and navigate through the system type ‘cd 0‘ to access the root (/) directory or ‘cd 1‘ to access the ‘localhost/’ directory. You can type the ‘ls’ command to list the contents of a directory. I am going to type ‘cd 1‘ to access my localhost directory so lets see what we have.

  • Type ls to see what directory structure we have now. You should now see your datacenter or datacenters

  • Change directory by typing cd 0 to the relevant datacenter and you will now see the following folder structure.

  • Type ls to see the structure of this folder

  • Type cd 1 to change to the Computers folder where we will see the cluster and then type ls

  • We can now use a command to check the state of the vSAN cluster. You don’t want to enter the command ‘vsan.check_state vsan-cluster’ as that will not work. The number ‘0’ is what you need to use to look at the state of the cluster so type vsan.check_state 0

  • Next look at the vSAN Object Status Report. Type vsan.obj_status_report 0

  • We can also run the command vsan.obj_status_report 0 -t which displays a table with more information about vSAN objects

  • Next look at a detailed view of the cluster. Type vsan.cluster_info 0

  • Next we’ll have a look at disk stats. Type vsan.disks_stats 0

  • Next have a look at simulating a failure of a host on your vSAN cluster. type vsan.whatif_host_failures 0

  • You can also type vsan.whatif_host_failures -s 0

  • You can also view VM performance by typing vsan.vm_perf_stats “vm” This command will sample disk performance over a period of 20 seconds. This command will provide you with ‘read/write’ information IOPS, throughput and latency

Using vSAN Observer

Click me –> https://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2064240

To generate a performance statistics bundle over a one hour period at 30 second intervals for a vSAN cluster named vSAN and save the generated statistics bundle to the /tmp folder, run this command:

  • Log into rvc
  • Navigate down to Computers
  • Type the following vsan.observer ~/computers/clustername(fill this in)/ –run-webserver –force –generate-html-bundle /tmp –interval 30 –max-runtime 1
  • While this is running, you can log into a web browser and run http://vCentername:8010 which will provide multiple graphs and information you can view
  • Press Control C to stop if you want to stop this prior to the test ending.

Inaccessible objects or orphaned objects

If you get an issue like I did with an orphaned object then browse through the vSAN datastore in the Web Client and find the GUID of the object and run the following command on the hosts. Take care you have the correct GUID! The first command checks the GUID and the second command deletes the GUID.

  • /usr/lib/vmware/osfs/bin/objtool getAttr -u 5825a359-2645-eb1e-b109-002564f9b0c2
  • /usr/lib/vmware/osfs/bin/objtool delete -u 5825a359-2645-eb1e-b109-002564f9b0c2 -f -v 10
  • Give it a minute and you will see it vanish from your vSAN datastore

Useful Commands

Within the RVC Console, type in vsan. then press tab twice to get the whole list of vsan commands you can use.

On the hosts the following commands can be useful

  • cd /etc/init.d/vsanmgmtd status
  • cd /etc/init.d/vsanmgmtd restart

Using HCIBench v1.6.3 to performance test vSAN 6.6

vSAN Load Testing Tool: HCIBench

*Note* HCIBench is now on v1.6.6 – Use this version.

VMware has a vSAN Stress and Load testing tool called HCIBench, which is provided via VMware’s fling capability. HCIbench can be run in versions 5.5 and upwards today as a replacement for the vSAN Proactive tests which are inbuilt into vSAN currently. I am running this against vSphere 6.5/vSAN 6.6 today. HCIBench provides more flexibility in defining a target performance profile as input and test results from HCIBench can be viewed in a web browser and saved to disk.

HCIBench will help simplify the stress testing task, as HCIBench asks you to specify your desired testing parameters (size of working set, IO profile, number of VMs and VMDKs, etc.) and then spawns multiple instances of Vdbench on multiple servers. If you don’t want to configure anything manually there is a button called Easyrun which will set everything for you. After the test run is done, it conveniently gathers all the results in one place for easy review and resets itself for the next test run.

HCIBench is not only a benchmark tool designed for vSAN, but also could be used to evaluate the performance of all kinds of Hyper-Converged Infrastructure Storage in vSphere environment.

Where can I can find HCI Bench?

There is a dedicated fling page which will provide access to HCIBench and its associated documentation. A zip file containing the Vdbench binaries from Oracle will also be required to be downloaded which can be done through the configuration page after the appliance is installed. You will need to register an account with Oracle to download this file but this doesn’t take long.

HCIBench Download: labs.vmware.com/flings/hcibench

HCIBench User Guidehttps://download3.vmware.com/software/vmw-tools/hcibench/HCIBench_User_Guide.pdf

Requirements

  • Web Browser: IE8+, Firefox or Chrome
  • vSphere 5.5 and later environments for both HCIBench and its client VMs deployment

HCIBench Tool Architecture

The tool is specifically designed for running performance tests using Vdbench against a vSAN datastore.
It is delivered in the form of Open Virtualization Appliance (OVA) that includes the following components:

The test Controller VM is installed with:

  • Ruby vSphere Console (RVC)
  • vSAN Observer
  • Automation bundle
  • Configuration files
  • Linux test VM template

The Controller VM has all the needed components installed. The core component is RVC (https://github.com/vmware/rvc) with some extended features enabled. RVC is the engine of this performance test tool, responsible for deploying Vdbench Guest VMs, conducting Vdbench runs, collecting results, and monitoring vSAN by using vSAN Observer.

VM Specification Controller VM

  • CPU: 8 vCPU
  • RAM: 4GB
  • OS VMDK: 16GB
  • Operating system: Photon OS 1.0
  • OS Credential: user is responsible for creating the root password when deploying the VM.
  • Software installed: Ruby 2.3.0, Rubygem 2.5.1, Rbvmomi 1.8.2, RVC 1.8.0, sshpass 1.05, Apache 2.4.18, Tomcat 8.54, JDK 1.8u102

Vdbench Guest VM

  • CPU: 4 vCPU
  • RAM: 4GB
  • OS VMDK: 16GB
  • OS: Photon OS 1.0
  • OS Credential: root/vdbench
  • Software installed: JDK 1.8u102, fio 2.13  SCSI Controller Type: VMware Paravirtual
  • Data VMDK: number and size to be defined by user

Pre-requisites

Before deploying this performance test tool packaged as OVA, make sure the environment meets the following requirements:

The vSAN Cluster is created and configured properly

  • The network for Vdbench Guest VMs is ready, and needs to have DHCP service enabled; if the network doesn’t have DHCP service, “Private Network” must be mapped to the same network when HCIBench being deployed.
  • The vSphere environment where the tool is deployed can access the vSAN Cluster environment to be tested
  • The tool can be deployed into any vSphere environment. However, we do not recommend deploying it into the vSAN Cluster that is tested to avoid unnecessary resource consumption by the tool.

What am I benchmarking?

This is my home lab which runs vSAN 6.6 on 3 x Dell Poweredge T710 servers each with

  • 2 x 6 core X5650 2.66Ghz processors
  • 128GB RAM
  • 6 x Dell Enterprise 2TB SATA 7.2k hot plug drives
  • 1 x Samsung 256GB SSD Enterprise 6.0Gbps
  • Perc 6i RAID BBWC battery-backed cache
  • iDRAC 6 Enterprise Remote Card
  • NetXtreme II 5709c Gigabit Ethernet NIC

Installation Instructions

  • Download the HCIBench OVA from https://labs.vmware.com/flings/hcibench and deploy it to your vSphere 5.5 or later environment.
  • Because the vApp option is used for deployment, HCIBench doesn’t support deployment on a standalone ESXi host, the ESXi host needs to be managed by a vCenter server.
  • When configuring the network, if you don’t have DHCP service on the VLAN that the VDBench client VMs will be deployed on, the “Private Network” needs to be mapped to the same VLAN because HCIBench will be able to provide the DHCP service.
  • Log into vCenter and go to File > Deploy OVF File

  • Name the machine and select a deployment location

  • Select where to run the deployed template. I’m going to run it on one of my host local datastores as it is recommended to run it in a location other than the vSAN.

  • Review the details

  • Accept the License Agreement

  • Select a storage location to store the files for the deployed template

  • Select a destination network for each source network
  • Map the “Public Network” to the network which the HCIBench will be
    accessed through; if the network prepared for Vdbench Guest VM doesn’t have DHCP service, map the “Private Network” to the same network, otherwise just ignore the “Private Network”.

  • Enter the network details. I have chosen static and filled in the detail as per below. I have a Windows DHCP Server on my network which will issue IP Addresses to the worker VMs.
  • Note: I added the IP Address of the HCIBench appliance into my DNS Server

  • Click Next and check all the details

  • The OVF should deploy. If you get a failure with the message. “The OVF failed to deploy. The ovf descriptor is not available” then redownload the OVA and try again and it should work.

  • Next power on the Controller VM and go to your web browser and navigate to your VM using http://<Your_HCIBench_IP>:8080. In my case http://192.168.1.116:8080. Your IP is the IP address you gave it during the OVF deployment or the DHCP address it picked up if you chose this option. If it asks you for a root password, it is normally what you set in the Deploy OVF wizard.
  • Log in with the root account details you set and you’ll get the Configuration UI

  • Go down the whole list and fill in each field. The screen-print shows half the configuration
  • Fill in the vCenter IP or FQDN
  • Fill in the vCenter Username as username@domain format
  • Fill in the Center Password
  • Fill in your Datacenter Name
  • Fill in your Cluster Name
  • Fill in the network name. If you don’t fill anything in here, it will assume the “VM Network” Note: This is my default network so I left it blank.
  • You’ll see a checkbox for enabling DHCP Service on the network. DHCP is required for all the Vdbench worker VMs that HCIBench will produce so if you don’t have DHCP on this network, you will need to check this box so it will assign addresses for you. As before I have a Windows DHCP server on my network so I won’t check this.

  • Next enter the Datastore name of the datastore you want HCIBench to test so for example I am going to put in vsanDatastore which is the name of my vSAN.
  • Select Clear Read/Write Cache Before Each Testing which will make sure that test results are not skewed by any data lurking in the cache. It is designed to flush the cache tier prior to testing.
  • Next you have the option to deploy the worker VMs directly to the hosts or whether HCIBench should leverage vCenter

If this parameter is unchecked, ignore the Hosts field below, for the Host Username/Password fields can also be ignored if Clear Read/Write Cache Before Each Testing is unchecked. In this mode, a Vdbench Guest VM is deployed by the vCenter and then is cloned to all hosts in the vSAN Cluster in a round-robin fashion. The naming convention of Vdbench Guest VMs deployed in this mode is
“vdbench-vc-<DATASTORE_NAME>-<#>”.
If this parameter is checked, all the other parameters except EASY RUN must be specified properly.
The Hosts parameter specifies IP addresses or FQDNs of hosts in the vSAN Cluster to have Vdbench Guest VMs deployed, and all these hosts should have the same username and password specifed in Host Username and Host Password. In this mode, Vdbench Guest VMs are deployed directly on the specified hosts concurrently. To reduce the network traffic, five hosts are running deployment at the same time then it moves to the next five hosts. Each host also deploys at an increment of five VMs at a time.

The naming convention of test VMs deployed in this mode is “vdbench-<HOSTNAME/IP>-<DATASTORE_NAME>-batch<VM#>-<VM#>”.

In general, it is recommended to check Deploy on Hosts for deployment of a large number of testVMs. However, if distributed switch portgroup is used as the client VM network, Deploy on Hosts must be unchecked.
EASY RUN is specifically designed for vSAN user, by checking this, HCIBench is able to handle all the configurations below by identifying the vSAN configuration. EASY RUN helps to decide how many client VMs should be deployed, the number and size of VMDKs of each VM, the way of preparing virtual disks before testing etc. The configurations below will be hidden if this option is checked.

  • You can omit all the host details and just click EASYRUN

  • Next Download the vDBench zip file and upload it as it is. Note: you will need to create yourself an Oracle account if you do not have one.

  • It should look like this. Click Upload

  • Click Save Configuration

  • Click Validate the Configuration.Note at the bottom, it is saying to “Deploy on hosts must be unchecked” when using fully automated DRS. As a result I changed my cluster DRS settings to partially automated and then I got the correct message below when I validated again.

  • If you get any issues, please look at the Pre-validation logs located here – /opt/automation/logs/prevalidation

  • Next we can start a Test. Click Test

  • You will see the VMs being deployed in vCenter

  • And more messages being shown

  • It should finish and say Test is finished

Results

  • Just as a note after the first test, it is worth checking that the Vms are spread evenly across all the hosts you are testing!
  • After the Vdbench testing finishes, the test results are collected from all Vdbench instances in the test VMs. And you can view the results at http://HCIBench_IP/results in a web browser and/or clicking the results button from the testing window.
  • You can also click Save Result and save a zip file of all the results
  • Click on the easy-run folder

  • Click on the .txt file

  • You will get a summarized results file

  • Just as a note in the output above, the 95th Percentile Latency can help the user to understand that during 95% of the testing time, the average latency is below 46.336ms
  • Click on the other folder

  • You can also see the individual vdBench VMs statistics by clicking on

  • You can also navigate down to what is a vSAN Observer collection. Click on the stats.html file to display a vSAN Observer view of the cluster for the period of time that the test was running

  • You will be able to click through the tabs to see what sort of performance, latency and throughput was occurring.

  • Enjoy and check you are getting the results you would expect from your storage
  • The results folder holds 200GB results so you may need to delete some results if it gets full. Putty into the appliance, go to /opt/output/results and you can use rm -Rf “filename”

Useful Links

  • Comments from the HCIBench fling site which may be useful for troubleshooting

https://labs.vmware.com/flings/hcibench/comments

  • If you have questions or need help with the tool, please email VSANperformance@vmware.com
  • Information about the back-end scripts in HCIBench thanks to Chen Wei

https://blogs.vmware.com/virtualblocks/2016/11/03/use-hcibench-like-pro-part-2/

An interesting point about VMs and O/S alignment – Do we still need this on vSAN and are there performance impacts?

https://blogs.vmware.com/vsphere/2014/08/virtual-san-block-alignment.html