ZFS compression types

ZFS compression as of OpenSolaris 2008.11 has a few types to choose from.

lzjb (default) | gzip | gzip-[1-9]

They are used via the zfs set compression=gzip poolname command.

The following test was quickly done out of personal interest – and is in no way scientific!

I have a AMD cpu with 3 cores (2.4Ghz). The data i copied to each of the shares consisted of video / documents / pictures and music. The first test i have done is based on compression only (i have not measured throughput)

Original Data Size : 412MB

lzjb : 312MB Compression ratio : 1.32

gzip : 293MB Compression ratio : 1.41

gzip9 : 292MB Compression ratio : 1.41

gzip is the winner on compression. With this small sample of data it is unclear if the extra CPU overhead on a gzip-9 zfs files system is worth it – from these results i would say it isn’t.

Again – gzip may be the winner on compression, but this does not reflect an improvement on throughput (untested).

Update: i’ve done a quick test on cpu load and throughput and i wouldnt recommend using gzip unless you are really limited on disk space – or have plenty of CPU to spare. lzjb is much faster (less load on cpu) and does a pretty good job for compression on the fly.

08/06/2011 Update

If you want to check the compression on a particular file you can use a combination of ls (true files size) and du (size after compression) like so…

actual size

ls -lh file*

Compressed size

du -hs file*

 

ZFS basics

I’ve had a play with WHS but eventually got annoyed with its lack of performance. Yes i know its not built for performance and typically is used just as a backup / simple store with duplication as redundancy, just i couldn’t stand the speed of the thing. If you were ever unlucky enough (even post power pack 1) to do a copy during the “data moving” the performance was even worse. On a positive note nothing beats it if you have a heap of non-similar sized disks that you want to put together (with redundancy) as a single shared storage pool.

Welcome to ZFS performance bliss….

Grab yourself OpenSolaris (i’m using 2008.11)

The tools of the trade are ;

zpool – this manages the zfs pools

zfs – this manages the zfs file systems

I was lucky enough to have 3 x 250GB drives, which i setup in raidz1 (similar to raid5 – single drive redundancy). The rest of my drives were just setup as a striped volume which contained mainly things i can afford to loose if a drive dies.  I used a separate 500GB disk as the system disk

After i had built the server i put in only the disks i wanted to work with next. So first i installed the 3 x 250GB disks and booted the machine. Running format then Ctrl-C showed me the device names. From deduction you can figure the names of the 3 new drives. Now its time to create a new raidz1 zpool with the following command;

zpool create poolname raidz1 dev2 dev3 dev4

Done – you should now have a mountable (and usable) file system at /poolname. If you didn’t want any redundancy just drop the “raidz1” out of the above command and you would get essentially a striped pool. Check the status of your zpool with this command;

zpool status poolname

Another thing i like to modify at the root of the new zpool is compression. So i usually run this command…

zfs set compresion=on poolname – enables compression (note: this does not typically slow down your file sever if you have the spare CPU). See this post for further details on zfs compression.

To check the settings currently applied to your pool run;

zfs get all poolname

If you wanted to create some additional zfs file systems within he zpool use the following command;

zfs create -o casesensitivity=mixed -o nbmand=on poolname/share

set casesensitivty=mixed  – allows windows to access files (via SMB) if not specified exactly to their original case. (this has to be set on creation). nbmand=on enables Cross-Protocol Locking.

The future of ZFS…

ZFS is adding more and more features as time goes on. I have heard rumors about some kind of de-duplication (single instant storage) type technology being implemented at some point. Also a data merger? – assuming it moves data across the pool more evenly.

Removing a device from a pool is also on the cards. Unsure if this is both striped and redundant pools though?

Visit gooseberry benefits for more information