unraid smart check – dead WD green drive

errors on unraid GUI – sometimes its a loose cable, sometimes its an issue with the drive.

Run this command to check smart status

smartctl -a -d ata /dev/sda
or if you are using a newer SATA controller
smartctl -a -A /dev/sda

http://lime-technology.com/wiki/index.php/Troubleshooting

unfortunately in my case, looks like drive is pretty much dead… not too bad for a drive almost 5 years old.

its pretty typical of a WD green drive in its default config to die in this type of environment, no plans to replace it with a similar type drive. You can see below the incredibly high LCC count which indicates the drive header has parked this many times over its life. This is probably part of the problem – there is a tool you can run (check this vid, link for  WDIDDLE3 also in comments – http://www.youtube.com/watch?v=J2eYyRI_F98) which disables the intellipark feature of the green drive. I never disabled the park timeout before this drive died (which defaults to 8 seconds!) — note: i have disabled it completely on my other green drives.

=== START OF INFORMATION SECTION ===

Model Family: Western Digital Caviar Green
Device Model: WDC WD10EADS-00M2B0
Serial Number: WD-WCAV51020991
LU WWN Device Id: 5 0014ee 2588170a5
Firmware Version: 01.00A01
User Capacity: 1,000,204,886,016 bytes [1.00 TB]
Sector Size: 512 bytes logical/physical
Device is: In smartctl database [for details use: -P show]
ATA Version is: ATA8-ACS (minor revision not indicated)
SATA Version is: SATA 2.6, 3.0 Gb/s
Local Time is: Thu Oct 30 18:48:41 2014 NZDT
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: FAILED!
Drive failure expected in less than 24 hours. SAVE ALL DATA.
See vendor-specific Attribute list for failed Attributes.

SMART Attributes Data Structure revision number: 16
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE
1 Raw_Read_Error_Rate 0x002f 168 154 051 Pre-fail Always – 12560032
3 Spin_Up_Time 0x0027 149 105 021 Pre-fail Always – 5508
4 Start_Stop_Count 0x0032 099 099 000 Old_age Always – 1253
5 Reallocated_Sector_Ct 0x0033 119 119 140 Pre-fail Always FAILING_NOW 648
7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always – 0
9 Power_On_Hours 0x0032 041 041 000 Old_age Always – 43079
10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always – 0
11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always – 0
12 Power_Cycle_Count 0x0032 100 100 000 Old_age Always – 371
192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always – 363
193 Load_Cycle_Count 0x0032 001 001 000 Old_age Always – 1932037
194 Temperature_Celsius 0x0022 118 076 000 Old_age Always – 29
196 Reallocated_Event_Count 0x0032 001 001 000 Old_age Always – 463
197 Current_Pending_Sector 0x0032 199 193 000 Old_age Always – 323
198 Offline_Uncorrectable 0x0030 199 190 000 Old_age Offline – 186
199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always – 0
200 Multi_Zone_Error_Rate 0x0008 003 001 000 Old_age Offline – 39455

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining LifeTime(hours) LBA_of_first_error
# 1 Short offline Completed: read failure 90% 24914 789707146

Here is a good post on another forum about the issue (which also seems to hit some of the new RED drives);

https://forums.freenas.org/index.php?threads/hacking-wd-greens-and-reds-with-wdidle3-exe.18171/

I have disabled intellipark on the rest of my green drives (since they are close to 5 years and probably near failure). I have some new RED drives which i have increased the time out to 300 seconds. (most come with 300 sec timeout, but older firmware is at 8 seconds). From what I’ve been reading there is no physical difference between WD red and green drives, only the firmware differs. So if you are going to put some green drives into a NAS / RAID or Server environment ensure you run wdidle3 and either disable or change timeout on intellipark to 300 seconds. (then its pretty close to a red drive)

To check current status

wdidle3 /r

to disable intellipark

wdidle3 /d

to set to 300 (max) timeout

wdidle3 /s300

vmware : IDE to SCSI

I’ve found that vmware converter (this may be fixed in newer verions) creates vmware guests with an IDE controller. There can be performance issues if you choose to remain with this particular controller… Best bet is to change it to one of the various vmware SCSI controllers…

Depending on which windows operating system you are running depends on which controller you use….  http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1006621

Guest Operating System
Adapter Type
Windows 2003, 2008, Vista
lsilogic
Windows NT, 2000, XP
buslogic
Linux
lsilogic

 

http://sanbarrow.com/vmdk/vmx-ide2scsi.html

You can easily change the type of the virtual controller for a given disk.
Lets have a look at an example.

# Disk DescriptorFile

version=1
CID=fffffffe
parentCID=ffffffff
createType=”twoGbMaxExtentFlat”
# Extent description
RW 4193792 FLAT “diskname-f001.vmdk” 0
RW 2097664 FLAT “diskname-f002.vmdk” 0
# The Disk Data Base
#DDB
ddb.adapterType = “ide”
ddb.virtualHWVersion = “3”
ddb.geometry.cylinders = “6241”
ddb.geometry.heads = “16”
ddb.geometry.sectors = “63”

The disk above uses a virtual ide-controller.

ddb.adapterType = “buslogic” This entry converts the disk into a SCSI-disk with BusLogic Controller

ddb.adapterType = “lsilogic”   This entry converts the disk into a SCSI-disk with LSILogic Controller
ddb.adapterType = “ide”   This entry converts the disk into a IDE-disk with Intel-IDE Controller

This changes the harddisk – but doesn’t change the controller itself.

ide0.present = “TRUE”
ide1.present = “TRUE”
scsi0.virtualDev = “lsilogic”
scsi0.virtualDev = “buslogic”
scsi1.virtualDev = “lsilogic”
scsi1.virtualDev = “buslogic”

Use entries like this in your *.vmx file. By the way, you can have LSI-logic and BUS-logic controllers in one VM.

Think twice before you make changes like this with a boot-disk.

Bluescreen 07b – mass-storage driver:
Activate the apropriate driver in the registry: intelide.sys or vmscsi.sys or symmpi.sys – you may have to add files as well.

If you get the above issue on a w2k8 box you might be able to enable the LSI_SAS driver before you convert the machine to SCSI controller.

  1. Boot machine with IDE controller
  2. Take a snapshot (for failback)
  3. Regedit and find the following key \\HKLM\SYSTEM\ControlSet001\Services\LSI_SAS
  4. Change the “Start” dword from 4 to 0
  5. Shutdown the machine
  6. Remove all the virtual disks (do not delete the disks, just remove them)
  7. Create copies of each .vmdk file (cp) (for failback)
  8. Edit the .vmdk file for each disk (vi)
  9. Change the “adaptertype” to “lsilogic” (if w2k8)
  10. Re-add existing disks (this should also bring in a LSI SAS controller)
  11. Boot the machine

Black screen with cursor blinking in the topleft of the screen:
Write a new partition boot-sector.

vmware – measuring iscsi write performance

I picked this trick up off vmware support. If you’ve got your iscsi all setup you can drop to the shell (either ssh or console) and do this to measure your average write throughput.

time vmkfstools -c 10G /vmfs/volumes/san_vmfs/my_vm/fat_disk.vmdk -d eagerzeroedthick

Try larger a larger disk if this is too quick (free space permitting)

Essentially this will initiate the host to create a fat disk in the location above. You will then get a time recorded on how long it takes to execute this command. Then you can use your maths skill to work out the transfer rate…

While this is happening you can open another SSH type esxtop then press “d” and watch the (d)isk throughput on the console. Pressing “v” will show you stats per (v)irtual machine.

zfs – checking your zpool throughput

This is quite a good diagnostic for checking your disk throughput. Try copying data to and from your zpool while your running this command on the host…

zpool iostat -v unprotected 2

capacity     operations    bandwidth
pool         used  avail   read  write   read  write
----------  -----  -----  -----  -----  -----  -----
unprotected  1.39T   668G     18      7  1.35M   161K
c7d0       696G   403M      1      2  55.1K  21.3K
c9d0       584G   112G      8      2   631K  69.3K
c7d1       141G   555G      8      2   697K  70.0K
----------  -----  -----  -----  -----  -----  -----

The above command will keep displaying the above output every 2 seconds (average during that time). I’ve used it a few times to ensure that all disks are being used (in write operations) where needed. Of course read op’s may not be typically across all disks as it will depend where the data is…

As you can see in the output from my “unprotected” zpool, my disk “c7d0” is near full so less write operations will be on this disk. In my scenario most of my reads also come from this disk, this was due me copying most of the data into this zpool when there was only this single disk.

I’ve heard rumor of a zfs feature in future that will re-balance the data across all the disks (unsure if its live or on a set schedule)

Another way to show some disk throughput figures is to run the iostat command like so…

iostat -exn 10

extended device statistics
device    r/s    w/s   kr/s   kw/s wait actv  svc_t  %w  %b
cmdk17    1.0    0.0   71.5    0.0  0.0  0.0   10.9   0   1
cmdk18    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
cmdk19    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0
cmdk20    0.8    0.0   33.5    0.0  0.0  0.0   13.5   0   1
cmdk21    0.4    0.0    0.5    0.0  0.0  0.0   15.5   0   1
cmdk22    0.8    0.0   66.3    0.0  0.0  0.0    9.0   0   1
cmdk23    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0

cmdk24    0.0    0.0    0.0    0.0  0.0  0.0    0.0   0   0

extended device statistics       —- errors —

                     extended device statistics       ---- errors --- 


r/s    w/s   kr/s   kw/s wait actv wsvc_t asvc_t  %w  %b s/w h/w trn tot device
0.0    0.0    0.0    0.0  0.0  0.0    0.0    0.0   0   0   0   0   0   0 c11d1
0.0    7.7    0.0   25.8  0.0  0.0    2.3    4.9   0   3   0   0   0   0 c8d0
0.0   17.6    0.0  238.0  0.0  0.0    0.0    0.3   0   0   0   0   0   0 c9d0
0.0    1.0    0.0    0.8  0.0  0.0    0.0    0.3   0   0   0   0   0   0 c7t0d0
0.0    1.0    0.0    0.8  0.0  0.0    0.0    0.2   0   0   0   0   0   0 c7t2d0
0.0    1.0    0.0    0.8  0.0  0.0    0.0    0.3   0   0   0   0   0   0 c7t3d0
0.7   21.1   29.9  315.0  0.0  0.0    0.0    1.1   0   1   0   0   0   0 c7t4d0
0.7   20.9   29.8  314.9  0.0  0.0    0.0    1.7   0   2   0   0   0   0 c7t5d0
0.8   21.0   34.1  315.0  0.0  0.0    0.0    1.2   0   1   0   0   0   0 c7t6d0
0.5   20.8   21.3  314.8  0.0  0.0    0.0    1.1   0   1   0   0   0   0 c7t7d0

This should show you all your disks and update on a 5 second interval. Copying data back and forth to your drives will show various stats.