zfs : accidentally adding cache drive to raidz zpool

http://forums.freebsd.org/showthread.php?t=23127

Unfortunately if you have accidentally added a single drive into your raidz pool at the top-level there is no way to just remove the non redundant disk. Your pool is now dependant on this disk.

If you want your pool to be just raidz vdevs, then you will need to backup your data, destroy your pool, create a new pool, and restore your data.

There is no current way to remove a top-level vdev from a pool.

Opensolaris – ZFS recovery after kernel panic

Recently i hit what i thought was a huge disaster with my ZFS array. Essentially i was unable to import my zpool without causing the kernel to panic and reboot. Still unsure of the exact reason, but it didn’t seem to be due to a hardware fault. (zpool import showed all disks as ONLINE)

When i tried to import with zpool import -f tank the machine would lockup and reboot (panic).

The kernel panic;  (key line)

> genunix: [ID 361072 kern.notice] zfs: freeing free segment (offset=3540185931776 size=22528)

Nothing i could do would fix it… tried both of these options in the system file with no success;

set zfs:zfs_recover=1
set aok=1

After a quick email from a Sun Engineer (kudos to Victor), the zdb command line that fixed it;

zdb -e -bcsvL <poolname>

zdb is a read only diagnostic tool, but seemed to read through the sectors that had the corrupt data and fix things??  (not sure how a read only tool does that) – the run took well over 15hrs.

Updated: 20/10/2009

Apparently if you have set zfs:zfs_recover=1 in your system file the zdb command will operate in a different manner fixing the issues it encounters.

Remember to run a zpool scrub <poolname> if you are lucky enough to get it back online.

This thread has some additional info…

http://opensolaris.org/jive/message.jspa?messageID=479553

Update 31/05/2012

This command has also helped me when i cant mount a pool in RW mode

zpool import -F -f -o readonly=on -R /mnt/temp zpool2

zfs – playing with various configs

If you dont have the disks available to build up a zpool and have a play with zfs you can actually just use files created with the mkfile command… The commands are exactly the same.

mkfile 64m disk1

mkfile 64m disk2

mkfile 64m disk3

mkfile 10m disk4

mkfile 100m disk5

mkfile 100m disk6

Now you can create a zpool using the above files… (i’m using raidz for this setup)

zpool create test raidz /fullpath/disk1 /fullpath/disk2 /fullpath/disk3

if you now want to expand this pool using another three drives (files) you can run this command

zpool add test raidz /fullpath/disk4 /fullpath/disk5 /fullpath/disk6

Check the status of the zpool

zpool status test

NAME STATE READ WRITE CKSUM

test ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/export/home/daz/disk1 ONLINE 0 0 0
/export/home/daz/disk2 ONLINE 0 0 0
/export/home/daz/disk3 ONLINE 0 0 0
raidz1 ONLINE 0 0 0
/export/home/daz/disk4 ONLINE 0 0 0
/export/home/daz/disk5 ONLINE 0 0 0
/export/home/daz/disk6 ONLINE 0 0 0

errors: No known data errors


Now time to replace a drive (perhaps you wish to slowly increase your space) Note: all drives in that particular raidz pool need to be replaced with larger drives before the additional space is shown.

mkfile 200m disk7

mkfile 200m disk8

mkfile 200m disk9

Check the size of the zpool first;

zpool list test

NAME SIZE USED AVAIL CAP HEALTH ALTROOT
test 464M 349M 115M 75% ONLINE –

Now replace all of the smaller drives with the larger ones…

zpool replace test /export/home/daz/disk1 /export/home/daz/disk7
zpool replace test /export/home/daz/disk2 /export/home/daz/disk8
zpool replace test /export/home/daz/disk3 /export/home/daz/disk9

The space will show up if you bounce the box, i’ve heard that sometimes you may need to export and import but i’ve never had to do that.

zfs – checking your zpool throughput

This is quite a good diagnostic for checking your disk throughput. Try copying data to and from your zpool while your running this command on the host…

zpool iostat -v unprotected 2

capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
unprotected  1.39T   668G     18      7  1.35M   161K
c7d0       696G   403M      1      2  55.1K  21.3K
c9d0       584G   112G      8      2   631K  69.3K
c7d1       141G   555G      8      2   697K  70.0K
----------  -----  -----  -----  -----  -----  -----

The above command will keep displaying the above output every 2 seconds (average during that time). I’ve used it a few times to ensure that all disks are being used (in write operations) where needed. Of course read op’s may not be typically across all disks as it will depend where the data is…

As you can see in the output from my “unprotected” zpool, my disk “c7d0” is near full so less write operations will be on this disk. In my scenario most of my reads also come from this disk, this was due me copying most of the data into this zpool when there was only this single disk.

I’ve heard rumor of a zfs feature in future that will re-balance the data across all the disks (unsure if its live or on a set schedule)

Another way to show some disk throughput figures is to run the iostat command like so…

iostat -exn 10

extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk17    1.0    0.0   71.5    0.0  0.0  0.0   10.9   0   1
cmdk18    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
cmdk19    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
cmdk20    0.8    0.0   33.5    0.0  0.0  0.0   13.5   0   1
cmdk21    0.4    0.0    0.5    0.0  0.0  0.0   15.5   0   1
cmdk22    0.8    0.0   66.3    0.0  0.0  0.0    9.0   0   1
cmdk23    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0

cmdk24    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0

extended device statistics       —- errors —

                     extended device statistics       ---- errors --- 


r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c11d1
0.0    7.7    0.0   25.8  0.0  0.0    2.3    4.9   0   3   0   0   0   0 c8d0
0.0   17.6    0.0  238.0  0.0  0.0    0.0    0.3   0   0   0   0   0   0 c9d0
0.0    1.0    0.0    0.8  0.0  0.0    0.0    0.3   0   0   0   0   0   0 c7t0d0
0.0    1.0    0.0    0.8  0.0  0.0    0.0    0.2   0   0   0   0   0   0 c7t2d0
0.0    1.0    0.0    0.8  0.0  0.0    0.0    0.3   0   0   0   0   0   0 c7t3d0
0.7   21.1   29.9  315.0  0.0  0.0    0.0    1.1   0   1   0   0   0   0 c7t4d0
0.7   20.9   29.8  314.9  0.0  0.0    0.0    1.7   0   2   0   0   0   0 c7t5d0
0.8   21.0   34.1  315.0  0.0  0.0    0.0    1.2   0   1   0   0   0   0 c7t6d0
0.5   20.8   21.3  314.8  0.0  0.0    0.0    1.1   0   1   0   0   0   0 c7t7d0

This should show you all your disks and update on a 5 second interval. Copying data back and forth to your drives will show various stats.

ZFS basics

I’ve had a play with WHS but eventually got annoyed with its lack of performance. Yes i know its not built for performance and typically is used just as a backup / simple store with duplication as redundancy, just i couldn’t stand the speed of the thing. If you were ever unlucky enough (even post power pack 1) to do a copy during the “data moving” the performance was even worse. On a positive note nothing beats it if you have a heap of non-similar sized disks that you want to put together (with redundancy) as a single shared storage pool.

Welcome to ZFS performance bliss….

Grab yourself OpenSolaris (i’m using 2008.11)

The tools of the trade are ;

zpool – this manages the zfs pools

zfs – this manages the zfs file systems

I was lucky enough to have 3 x 250GB drives, which i setup in raidz1 (similar to raid5 – single drive redundancy). The rest of my drives were just setup as a striped volume which contained mainly things i can afford to loose if a drive dies.  I used a separate 500GB disk as the system disk

After i had built the server i put in only the disks i wanted to work with next. So first i installed the 3 x 250GB disks and booted the machine. Running format then Ctrl-C showed me the device names. From deduction you can figure the names of the 3 new drives. Now its time to create a new raidz1 zpool with the following command;

zpool create poolname raidz1 dev2 dev3 dev4

Done – you should now have a mountable (and usable) file system at /poolname. If you didn’t want any redundancy just drop the “raidz1” out of the above command and you would get essentially a striped pool. Check the status of your zpool with this command;

zpool status poolname

Another thing i like to modify at the root of the new zpool is compression. So i usually run this command…

zfs set compresion=on poolname – enables compression (note: this does not typically slow down your file sever if you have the spare CPU). See this post for further details on zfs compression.

To check the settings currently applied to your pool run;

zfs get all poolname

If you wanted to create some additional zfs file systems within he zpool use the following command;

zfs create -o casesensitivity=mixed -o nbmand=on poolname/share

set casesensitivty=mixed  – allows windows to access files (via SMB) if not specified exactly to their original case. (this has to be set on creation). nbmand=on enables Cross-Protocol Locking.

The future of ZFS…

ZFS is adding more and more features as time goes on. I have heard rumors about some kind of de-duplication (single instant storage) type technology being implemented at some point. Also a data merger? – assuming it moves data across the pool more evenly.

Removing a device from a pool is also on the cards. Unsure if this is both striped and redundant pools though?

Visit gooseberry benefits for more information