Tuesday, June 9, 2009

The hypervisor processor is not being utilized

Recently, I have answered this question in the forums quite a bit.

The basic situation is:  the processor within a virtual machine is running at 100%, but the host processors at sitting there, twiddling their thumbs and only using 5% (for example).

The next response that I usually see is:  How can I tweak this all about to make that more the way I see things happening when not running in a VM.

First of all, stop there.  Type I hypervisors (XenServer, ESX, Hyper-V, VirtualBox, etc.) all have to manage the pool of physical resources.  It is all about isolating, containerizing, containing, and sharing the physical resources.

If a guest goes 100% on a processor the host should not go 100% on a processor.

The Type I is a full hypervisor, therefore all physical processors have been virtualized and the hypervisor migrates processes between the processors to balance out the load.

This is to maintain the integrity of the entire virtual environment and to prevent a single VM from hogging the entire system.
What you see with Hyper-V you should see with ESX, or XenServer, or Virtual Iron, etc.

You will se different results with VirtualPC, Virtual Server, VMware Server - because they are not full hypervisors - they are hosted virtualization solutions and share in the physical resources in a different way.

Here is a scenario:  What if the VM processor utilization was dynamic, it is allowed to take more from the host as it needs it.

If the amount of processing power given to a VM was dynamic.  In that if the VM spikes, then a host processor spikes.
As soon as you have more than one VM, all the other VMs now lose.

And, if a second VM does this same thing, now the remaining VM lose even more.

In the mean time, the poorly written application that is causing the processor spiking in the first place is taking resources from all the other users that are sharing in the pool of physical resources, for no good reason.  He is just being a hog.

Also, that operating system that you login to at the console, think of that as a VM as well.  He also has to share in the pool of physical resources.  So, if a single VM is allowed to spike a physical processor, then the Host itself also loses and it not able to respond to all the other VMs that run on the host including the hog VM.

For there it is just a downward spiral into the depths of an impending crash of the entire host and all of the VMs.
this is the hypervisor model.  All machines running on the hypervisor must share (play nice) with each other, or everyone loses.

So each machine is placed into a container, and that container is bounded.

These bounds can be modified on a VM by VM basis.  And if you have a single host only running a couple VMs, then playing with these settings generally does no harm.  As soon as you scale and add more and more VMs, this tweaking gets out of hand very quickly.

You tweak VM A in a positive way, which in turn has a negative impact on VM B and C.  So you compensate and tweak VM B and C which in turn has an impact on VM A again.  And you end up tweaking the environment to death.

The recommendation from all hypervisor vendors is to not mess with the default settings unless absolutely necessary.  And if you do, document it very well.

Now, if you have a single VM that is miss-behaving, then you need to dive into that particular VM (just like a physical server) to determine why he is processor spiking.  Is it an application?  Is it threading?  Is it device drivers?  Was the VM converted from another platform or physical installation?

There are tons of factors.   But always begin by looking at the application or process that is taking the processor and expanding from there.

No comments: