All out of HA Slots

A few weeks a go I was moving a customer from an old set of ESX servers (not HA clustered) to a new infrastructure of Clustered ESX hosts. After building, testing and verifying the hosts we started moving the VM’s. It became apparent after a little while there were some resource issues. After just a few VM’s were moved an alert appeared that we could not start any new machines. I start looking at the cluster and there is plenty of extra Memory and CPU. Still nothing will start.

I say to myself, “Self, we have read about this before.” I thought back to this HA Deep Dive article by Duncan Epping.

Lets check the HA slots! (on a side note, if you use HA and have never read the Deep Dive, go do it now!)

media_1276972861425.png

As you can see here the slot size is rather giant. We have the largest CPU and Memory reservation plus some overhead (for simplicity) and that blows the size of the slot way up. I didn’t set the reservation, but surely they were there. 8GB of reserved memory. 4000MHz of CPU. Ouch. Where did that come from? It followed the VM from the old host to the new one. One of the reasons I was there was to setup a new cluster since the older ones were performing so slow on the local storage. It seems like someone tried to help some critical VM’s along the way by adding the reservations. I removed the reservations and had plenty of slots as you see below.

media_1276973677553.png

Yeah! I was able to power on another VM!

The new cluster blew away the old one. Went from older Xeon’s to 6 core Nehalem’s, from local disks to 48 disks of Equallogic Storage. The reservation was no longer needed.

Lessons:

  1. Be careful with reservations, it can impact your failover capacity.

  2. Reservations set on the machine will follow it to a new host.

Written on July 1, 2010