Friday, 28 October 2011

HA Failover Capacity

We get an enormous amount of questions about VMware’s HA (High Availability), especially when users see a message stating there are Insufficient resources to satisfy HA failover. We have already discussed the mechanism that HA uses to provide high availability here. Now we need to understand capacity calculations. In current versions of ESX (3.02) and earlier the following calculation applies for failover capacity.
HA Failover Capacity

Failover Capacity is determined using a slot size value that is calculated on the cluster. Slots are calculated by a combination of the total CPU and Memory that are in the physical hosts. The calculation for failover capacity works as follows:

Let’s say you have 4 ESX servers in your VMware HA cluster and Configured Failover capacity on the cluster is set to 1.

Physical memory in the hosts is as follows:

ESX1 = 16 GB
ESX2 = 24 GB
ESX3 = 32 GB
ESX4 = 32 GB

In the cluster you have 24 VM’s each configured and running. Of the 24 VM’s running, determine the VM which has the highest “configured memory”. For this example let’s say this is 2GB. All other VMs are configured with less or equal to 2GB.

With this information we can now do the calculation:
1. Pick the ESX host which has the least amount of RAM. In this case it is ESX1 and the minimum amount of RAM is = 16 GB

2. Divide the value found in step 1 with value for the maximum RAM in a VM. In my example this gives us 8 (16 divided by 2). This means we have 8 slots available per ESX host in the cluster.

3. Since we have 4 hosts and the configured failover capacity for the cluster is 1, we are left with 3 hosts in a failure situation. Hence the total number of VMs that can be powered on these 3 servers is 24 VMs. (i.e. 8 multiplied by 3 = 24)

4. If the total number of VMs in the cluster exceeds 24 then it will give us “Insufficient resources to satisfy HA failover” and the “current failover capacity will be shown as 0″. If the number is less than 24, we should not get this message.

Note: If you are still seeing the message and you have less VM’s running than in the calculation allows for, check both the CPU and Memory reservations on both VM’s andresource pools, as this can skew the calculation. You should avoid unnecessary memory or cpu reservations on VM’s as this can cause these types of errors to occur, because we have to ensure that the resource is available.

There are multiple ways to fix, or get around this calculation. The most common are as follows:
• Set the “Allow Virtual Machines to be powered on even if they violate availability constraints” in the configuration of the cluster. In this case it ignores the above calculation and will try to power on as many VM’s as possible in case of HA failover. If this is the option chosen you can also set restart priority in the ‘Virtual Machine Options’ section of the cluster configuration. This way any high priority VM’s are powered on first, and then the lower priority up to the point where we cannot power any further VM’s on
• If you have one VM which is configured with a very high amount of memory, you can either lower its configured memory, or take it out of the cluster and run it on any other standalone ESX host. This will increase the number of slots available with the current hardware
• Increase the amount of RAM on servers so that there are more slots available with the current RAM reservations.
• Remove any CPU reservations on any VM(s) that are greater than the max speed of the processors in the hosts. For example if the CPU Usage on the summary tab of your ESXServer shows as follows:

Then you will see the error message popup if you have a CPU reservation greater than 2793MHz on a VM.
Note: The above calculation method is very limited and is going to be revised in future releases of VirtualCenter to improve calculations for HA failover.


Post a Comment