Archive

Posts Tagged ‘zfs’

4k sector hard drives and zfs

June 6th, 2010 Daz No comments

I hit this as a problem recently. One of my disks died in my raidz so i ran down to the store and grabbed me a replacement WD10EARS (Western Digital 1Tb Green) drive.

BUT…

The one thing the store didn’t mention to me is the new 4K cluster sizing on the drive. I guess they assume most people run windows (though the issues are also present in XP). See these posts…

http://blog.temeletry.co.uk/2010/05/wd-green-wd10ears/

Unfortunately they really don’t work as well as you’d like in a server :(

  • They come with a 5 second head spin down setting that causes them to park their heads if they have been left idle for more than 5 seconds. As it takes a second or two to spin back up this can result in a very laggy experience during interactive sessions.
  • They do not have NCQ or any form of command queing/optimisation. This means that (on FreeBSD at least) you are stuck in the LOOK elevator. In particular this was noticed when doing sequential read & write (think dump|restore tar|untar etc) and interactive tasks simultaneously
  • They really suck with FreeBSD and ZFS…

http://community.wdc.com/t5/Desktop/Poor-performace-in-OpenSolaris-with-4K-sector-drive-WD10EARS-in/m-p/21132

While the other 512-byte sector HDDs were reading/writing at 30MB/s sustained, this EARS model did not exceeded the 1MB/s barrier.

I know for sure that this is related to the 512-byte sector firmware emulation, because the disk works perfectly well if I partition it in a 4k-sector alignment.

The thing is that even in that way, using it in a ZFS RAIDZ configuration the performance is very poor because RAIDZ uses a dynamic stripe size.

The bottom line here is that folks like me, that use different versions of Unix, need the firmware to present the disk as a 4K-sector disk to unleash the full potential of the technology. The OS is already prepared to support that sector size, no need for emulation here.

http://opensolaris.org/jive/thread.jspa?threadID=125702

Some preliminary testing that I have done…the WD20EARS (2TB advanced format drives) actually presents emulated 512byte sectors to the host o/s.

The drive documentation indicates that jumpers 7-8 should be enabled if the o/s does not support advanced format drives – the drive still present 512 bytes sectors.

I have attempted to raise a support ticket querying this, and how one can disable 512byte sector emulation in the drive (perhaps through a firmware upgrade) but I have not received any response to date.

Hopefully is enough people raise support tickets, WD may release firmware that allows the drive to natively present 4k blocks. Other doco indicates several other jumper combinations – all do not seem to make the drive present 4k byte blocks.

Perhaps someone internal to sun that has a relationship with WD may be able to shed some light on this? It would be fantastic to find out that I was just doing something wrong -> then I can get the drives to be seen on 32bit systems (ie – our embedded kit for osol, velitium)

Tested using b133 (64bit intel).

Try to avoid the green drives in ZFS for now. Remember to do your research before you buy a bunch of disks. I was caught off guard by this small change (works fine in win7 etc) which kills performance in ZFS. Ouch.

Categories: Linux, OpenSolaris, Storage

zfs – now has dedup!

April 5th, 2010 Daz No comments

Cool. zfs as of version 21 has deduplication built in. And thats the good dedup – synchronous dedup. i.e. deduped on the fly!

How easy is it to turn on? – very!

Once you have upgraded your zpool to 21 or above you can run the following command at the pool level and deduplication will be over all your data from that point onwards.

zpool set dedup=on tank

Done

Note : Watch your performance, it will drop like a rock if you do not have enough ram for your dedup tables. Do some tests after enabling this feature.

http://hub.opensolaris.org/bin/view/Community+Group+zfs/dedup

Categories: OpenSolaris, Storage

OpenSolaris – iSCSI

November 13th, 2009 Daz 3 comments

Want iSCSI in opensolaris?

Grab SUNWiscsitgt via package manager.

enable the server via svcadm;

svcadm enable iscsitgt

create your zfs iscsi pool;  (this command will limit iscsi drive to 500GB in size)

zfs create -V 500G tank/iscsi

set isci on via zfs command;

zfs set shareiscsi=on tank/iscsi

check that target is up and running;

iscsitadm list target -v

Done. Should be able to connect via ip from another machine. I have not covered CHAP or any client side configuration. Assumed isolated LAN.

HDTune_Benchmark_SUN_____SOLARIS

FreeNAS – zfs in version 0.7

October 28th, 2009 Daz No comments

If you want to give zfs a go and also want a dedicated file server this is the solution.

Check it out here;

http://www.freenas.org/

smart

I’m still using opensolaris though as i like running a few virtualbox machines on the same box.

Categories: Storage

Opensolaris – ZFS recovery after kernel panic

October 19th, 2009 Daz No comments

Recently i hit what i thought was a huge disaster with my ZFS array. Essentially i was unable to import my zpool without causing the kernel to panic and reboot. Still unsure of the exact reason, but it didn’t seem to be due to a hardware fault. (zpool import showed all disks as ONLINE)

When i tried to import with zpool import -f tank the machine would lockup and reboot (panic).

The kernel panic;  (key line)

> genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528)

Nothing i could do would fix it… tried both of these options in the system file with no success;

set zfs:zfs_recover=1
set aok=1

After a quick email from a Sun Engineer (kudos to Victor), the zdb command line that fixed it;

zdb -e -bcsvL <poolname>

zdb is a read only diagnostic tool, but seemed to read through the sectors that had the corrupt data and fix things??  (not sure how a read only tool does that) – the run took well over 15hrs.

Updated: 20/10/2009

Apparently if you have set zfs:zfs_recover=1 in your system file the zdb command will operate in a different manner fixing the issues it encounters.

Remember to run a zpool scrub <poolname> if you are lucky enough to get it back online.

This thread has some additional info…

http://opensolaris.org/jive/message.jspa?messageID=479553

Categories: OpenSolaris, Storage