vMA – Using HP Power Protector agent to shutdown virtual hosts

Goal was to create a vMA which would send a shutdown signal to all the virtual hosts it knows about. I’ve assumed that ESX and management agents that live within the service console will eventually be phased out.

The steps…

  1. Download and install the vMA from vmware – fire up the vm and setup the basic networking required
  2. Create trusts within the vMA to each of the vmware hosts you wish to manage (sudo vifp addserver <server>)
  3. Install HP Power Protector (linux agent) – you could substitute for your UPS client software
  4. Modify the shutdown script (may differ per vendor) to shutdown the esxboxs- “SDScript” (hostops.pl)

The 1st step is easy enough. Find the download at vmware and install (Deploy OVF Template…)

Once up and running you need to create the trust between the vMA and your ESX hosts. Logon using vi-admin and your password. then;

sudo vifp addserver esxbox1 (you will be prompted for the hosts root password)

Do this for each box you wish the vMA to manage.

Once you have installed your UPS agent into the vMA (linux client should work without issue) the next step is to modify the shutdown script to do the work. Within the script you will need the following in it…

vifpinit esxbox1

This sets the context of vMA to this host. Note: the server should have been added as one of the managed

/usr/lib/vmware-vcli/apps/host/hostops.pl –target_host esxbox1 –operation enter_maintenance –url https:///sdk/vimService.wsdl

Note: you need to use the actual name of the host and not its IP. You can get the exact name of the

The HP power protector agent script is located at /usr/local/DevMan/SDScript

If you want to ensure all your guests shutdown cleanly, enable “shutdown virtual machines on host shutdown”. Also note that if you shutdown the host that the vMA is running on it will kill the script. So shutdown the host that the vMA is running on last (remove vm from DRS)

Update (08/04/2010): I have found the above vMA to be quite fiddly. And have had much better luck with the PowerCLI code found on this page : http://www.virtu-al.net/2010/01/06/powercli-shutdown-your-virtual-infrastructure/

I have made some slight mods but essentaillly…

$VCENTER = "vcenter"
Connect-VIServer $VCENTER

# Get All the ESX Hosts
$ESXSRV = Get-VMHost

# For each of the VMs on the ESX hosts (excluding virtual center box)
Foreach ($VM in ($ESXSRV | Get-VM)){
    # Shutdown the guest cleanly
    if ($VM -match $VCENTER){}
    else {$VM | Shutdown-VMGuest -Confirm:$false}
}

# Set the amount of time to wait before assuming the remaining powered on guests are stuck

$WaitTime = 120 #Seconds

do {
    # Wait for the VMs to be Shutdown cleanly
    sleep 1.0
    $WaitTime = $WaitTime - 1
    $numvms = ($ESXSRV | Get-VM | Where { $_.PowerState -eq "poweredOn" }).Count
    Write "Waiting for shutdown of $numvms VMs or $WaitTime seconds"
   
    } until ((@($ESXSRV | Get-VM | Where { $_.PowerState -eq "poweredOn" }).Count) -eq 0 -or $WaitTime -eq 0)

# Shutdown the ESX Hosts - and remaining virtual center box (if virtual)
$ESXSRV | Foreach {Get-View $_.ID} | Foreach {$_.ShutdownHost_Task($TRUE)}

Write-Host "Shutdown Complete"

# If virtual center box is physical and still alive it will need to be shutdown...

Write-Host "Shutting down virtual center"
shutdown -s -f -t 1

vmware – HA issues

Most of the time your HA issues are going to be DNS related. So ensure that your vcenter can ping all your hosts by FQDN without issue.  In some cases though a stubborn server may not want to play the game even when everything is configured properly.

This method is considered a “last effort” as you’ll need to run some CLI commands on the ESX box. But i have found it useful in a few situations.

This page has a great write up on which files HA uses and how to temporary stop the HA service. http://itknowledgeexchange.techtarget.com/virtualization-pro/vmware-ha-failure-got-you-down/

Remember to get to the console on ESXi you logon to the console press Alt-F1 then type “unsupported” (note: you cannot see what you are typing), then enter the root password.

The main bits are as follows;

Stop the HA service

service vmware-aam stop

Check that HA has stopped (if not then use kill command to kill them)

ps ax | grep aam | grep -v grep

Move the current HA config files to a backup directory (before restarting HA)

cd /etc/opt/vmware/aam

mkdir .old

mv * .old

mv .[a-z]* .old

Then back to your vcenter and select Reconfigure for VMware HA on the effected host. Fingers crossed that it starts up and reconfigures without any issues.

vSphere – ctrl-alt-del greyed out

This bug has hit me. Looks like users with roles like vm user / power user cannot send “ctrl-alt-del” via the console even though they have the correct permissions. Our users cannot use ctrl-alt-ins as they are connected via RDP to a machine that has the console installed.

Found this : http://communities.vmware.com/thread/220683;jsessionid=480C8A2C9B9EACA9FF2BB4E1BECA2D53?start=15&tstart=0

Looks like its a known bug and will be fixed in the upcoming VC4.0 update 1 sometime Q3 2009 :(

Luckily vSphere was setup in our pre-production environment – the machines i have running in production are still 3.5 with VC2.5.

zfs compression and latency

Since im using ZFS as storage via NFS for my some of my vmware environments i need to ensure that latency on my disk is reduced where ever possible.

There is alot of talk about ZFS compression being “faster” than a non-compressed pool due to less physical data being pulled off the drives. This of course depends on the system powering ZFS, but i wanted to run some tests specifically on latency. Throughput is fine in some situations, but latency is a killer when it comes to lots of small reads and writes (in the case of hosting virtual machines)

I recently completed some basic tests focusing on the differences in latency when ZFS compression (lzjb) is enabled or disabled. IOMeter was my tool of choice and i hit my ZFS box via a mapped drive.

I’m not concerned with the actual figures, but the difference between the figures

I have run the test multiple times (to eliminate caching as a factor) and can validate that compression (on my system anyhow) increases latency

Basic Results from a “All in one” test suite… (similar results across all my tests)

ZFS uncompressed:

IOps : 2376.68
Read MBps : 15.14
Write MBps : 15.36
Average Response Time : 0.42
Average Read Response Time : 0.42
Average Write Response Time : 0.43
Average Transaction Time : 0.42

ZFS compressed: (lzjb)

IOps : 1901.82
Read MBps : 12.09
Write MBps : 12.28
Average Response Time : 0.53
Average Read Response Time : 0.44
Average Write Response Time : 0.61
Average Transaction Time : 0.53

As you can see from the results, the AWRT especially is much higher due to compression. I wouldn’t recommend using zfs compression where latency is a large factor (virtual machines)

Note: Under all the tests performed the CPU (dual core) on the zfs box was never 100% – eliminating that as a bottleneck.