ESX – network utilization

One of the best articles i have found on this subject is here : http://blog.scottlowe.org/2008/07/16/understanding-nic-utilization-in-vmware-esx/

There is some additional information here on setting up an etherchannel on the cisco side : http://blog.scottlowe.org/2006/12/04/esx-server-nic-teaming-and-vlan-trunking/

This can be handy if you need a single VM to use both physical nics in a load-balanced manner – both outbound and inbound. Of course its not really that simple though. This will really only add a benefit if the VM is communicating to multiple destinations (using ip hash – a single destination from a single VM with one IP will always be limited to the same physical nic).

switch(config)#int port-channel 1
switch(config-if)#description NIC team for ESX server
switch(config-if)#int gi0/1
switch(config-if)#channel-group 1 mode on
switch(config-if)#int gi0/2
switch(config-if)#channel-group 1 mode on

As per the article ensure you are using the same etherchannel method. The first command shows your current load-blance method, the 2nd command changes it to ip hash.

show etherchannel load-balance
port-channel load-balance src-dst-ip

Another solution is to use multiple iSCSI paths. This is newly supported within vSphere, see this post on setting up multiple paths : http://goingvirtual.wordpress.com/2009/07/17/vsphere-4-0-with-software-iscsi-and-2-paths/

Here is another good article on iSCSI within vSphere : http://www.delltechcenter.com/page/A+“Multivendor+Post”+on+using+iSCSI+with+VMware+vSphere

Some important points on using EMC Clariion with vSphere : http://virtualgeek.typepad.com/virtual_geek/2009/08/important-note-for-all-emc-clariion-customers-using-iscsi-and-vsphere.html

zfs compression and latency

Since im using ZFS as storage via NFS for my some of my vmware environments i need to ensure that latency on my disk is reduced where ever possible.

There is alot of talk about ZFS compression being “faster” than a non-compressed pool due to less physical data being pulled off the drives. This of course depends on the system powering ZFS, but i wanted to run some tests specifically on latency. Throughput is fine in some situations, but latency is a killer when it comes to lots of small reads and writes (in the case of hosting virtual machines)

I recently completed some basic tests focusing on the differences in latency when ZFS compression (lzjb) is enabled or disabled. IOMeter was my tool of choice and i hit my ZFS box via a mapped drive.

I’m not concerned with the actual figures, but the difference between the figures

I have run the test multiple times (to eliminate caching as a factor) and can validate that compression (on my system anyhow) increases latency

Basic Results from a “All in one” test suite… (similar results across all my tests)

ZFS uncompressed:

IOps : 2376.68
Read MBps : 15.14
Write MBps : 15.36
Average Response Time : 0.42
Average Read Response Time : 0.42
Average Write Response Time : 0.43
Average Transaction Time : 0.42

ZFS compressed: (lzjb)

IOps : 1901.82
Read MBps : 12.09
Write MBps : 12.28
Average Response Time : 0.53
Average Read Response Time : 0.44
Average Write Response Time : 0.61
Average Transaction Time : 0.53

As you can see from the results, the AWRT especially is much higher due to compression. I wouldn’t recommend using zfs compression where latency is a large factor (virtual machines)

Note: Under all the tests performed the CPU (dual core) on the zfs box was never 100% – eliminating that as a bottleneck.

opensolaris – network teaming

Otherwise known as trunking or link aggregation. I believe it is the best way to get that additional boost out of your network server while providing a bit of redundancy on link failure. here is how to do it…

Official docs on the process here… http://docs.sun.com/app/docs/doc/819-6990/gdysn?a=view and some good bits here http://blogs.sun.com/nickyv/entry/link_aggregation_jumpstart_post_install

dladm (data link admin) is the tool for the job. List the links you currently have…

dladm show-link

First shut down the links you are currently using..  (you will have to do this on the console)

ifconfig e1000g1 unplumb

Now join the two nics into one aggregate connection via….

dladm create-aggr -l e1000g1 -l rge0 aggr1

then bring up the new aggregate link

ifconfig aggr1 plumb IP-address up

Show link

dladm show-aggr

(Optional) Make the IP configuration of the link aggregation persist across reboots.

  1. Create the /etc/hostname file for the aggregation’s interface.

    If the aggregation contains IPv4 addresses, the corresponding hostname file is/etc/hostname.aggr1. For IPv6–based link aggregations, the corresponding hostname file is/etc/hostname6.aggr1.

  2. Type the IPv4 or IPv6 address of the link aggregation into the file.

  3. Perform a reconfiguration boot.

I have teamed an intel nic (e1000g) and a (rge) together without any issues…  the rge drive by itself had issues, but i have not come across them again since i trunked both interfaces together. Perhaps the e1000g takes the load while the other nic dies off..

Updated : 4/08/2009

To test the throughput / load balancing run these commands (in two terminal sesssions);

dladm show-link -s -i 5 rge0

dladm show-link -s -i 5 e1000g1

It will return the packets going over each nic. Copy some files back and forth over the interface and watch the numbers. RBYTES and OBYTES are the fields to watch (received and out bytes)

selinux – opening additional ports / or disabling

If you are having problems starting apache on a non-standard port you might find that the problem is related to selinux.

Type this command to check to see what http ports are currently allowed;  (remove filter to show all rules)

semanage port -l|grep http

To add another port type the following (with the port you wish to add etc);

semanage port -a -t http_port_t -p tcp 81

If you want to disable selinux completely then go into /etc/selinux/config and set selinux=disabled. Save then reboot.

opensolaris – smbd issues?

Hmm… i’ve been having problems since the 2009.06 (snv_111b) update with cifs.

Cant pin it exactly as it could be “load” related… hmmm.

found this ? http://opensolaris.org/jive/thread.jspa?threadID=107681 this also may be a clue.. http://opensolaris.org/jive/thread.jspa?threadID=92472&tstart=75

imapd ?  might have to go back to 2008.11

You might get better performance if you enable oplocks but
there are known issues with it but you can do it just to
see if you see any difference:

svccfg -s smb/server setprop smbd/oplock_enable=boolean: true

So far running the above command has fixed things for me? I’ll update if the problem returns.

svccfg -s smb/server setprop smbd/oplock_enable=boolean: true

Updated : 27/07/2009

Problem came back, so i’m updating to 117 as per comments below