hdrecover: A Shell repository from pLuster

hdrecover
---------

Attempts to recover a hard disk that has bad blocks on it.

WARNING: A hard disk with bad blocks on it is likely to fail! If you
value your data you should get a new hard disk instead of using this
program!

However, if you can't afford a new hard disk, or just like being
reckless with your data then this tool might just help you out!

Requirements:

* A 2.6 kernel
* A dead hard drive (well one with bad blocks - not completely dead)
* smartmontools is handy although not strictly needed
* root access

When to use:

First of all, if you haven't already got it - go download
smartmontools! (http://smartmontools.sf.net). Then run:

    smartctl -A /dev/hda

Where hda is whichever drive you're interested in. The output should
include three attributes that are interesting:

* Reallocated_Event_Count

  This is how many sectors have already been reallocated on the
  drive. We're hoping to get the hard disk to increase this number!

* Current_Pending_Sector

  The number of sectors that the drive thinks are dodgy. Bear in mind
  sometimes drives change their mind about whether a sector is bad or
  not - so this number can go down without a reallocation occuring.

* Offline_Uncorrectable

  This is the number of sectors that the drive has attempted to
  correct itself, but failed. Running the command:

      smartctl -t offline /dev/hda

  should cause the drive to test the sectors and attempt to fix
  them. Not all drives support this though.

So the attribute you're interested in at the moment is
Current_Pending_Sector. If this is not 0 then there's something up
with your disk. If Offline_Uncorrectable is less than
Current_Pending_Sectors then you may want to run an offline test which
may fix it, or you can dive straight in and use hdrecover which will
test (and attempt to fix) it itself.

So run (as root):

    hdrecover /dev/hda

and sit back and wait (or probably you're best off going and doing
something else while you wait). If the program comes across a bad
sector it will make several attempts at reading the sector. Between
each attempt it will randomly seek so as to reposition the head,
hopefully getting the data off the disk. If it succeeds then the drive
should automatically reallocate the sector. If after several attempts
the data still can't be read then the program will give you the option
of overwriting the data in the sector. This will force the drive to
reallocate the block.

WARNING: Overwriting the sector WILL cause data loss (it's fairly
obvious really!). Hopefully, since it is only one sector (or a handful
if there are more bad sectors on the drive), it won't be too
important. But bear in mind that you should at the very least run fsck
to check the integrity of the filesystem.

Once the program has finished (it takes a long time I'm afraid), a
summary will be printed showing how many blocks had errors. If you
repeat the smartctl command:

    smartctl -A /dev/hda

you should see that Current_Pending_Sector is now 0 and
Reallocated_Event_Count will have risen by the number of sectors the
drive decided to reallocate. Offline_Uncorrectable usually doesn't
immediately update - you have to either wait for the drive to update
it itself or run an offline test which should force the drive to
update it.

Have fun!

DISCLAIMER: Hard disks with bad sectors could fail at anytime -
although hdrecover appears to fix the drive you shouldn't trust it!
And just to re-inforce the GPL licence: if it in any way breaks your
computer then you get to keep both pieces! I accept no responsibility
for any damage it causes!

-------------------------------------------------------------------------------

Sample run:

SMART attribute table before:

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0029   100   100   020    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   064   063   020    Pre-fail  Always       -       4623
  4 Start_Stop_Count        0x0032   096   096   008    Old_age   Always       -       2882
  5 Reallocated_Sector_Ct   0x0033   099   099   020    Pre-fail  Always       -       5
  7 Seek_Error_Rate         0x000b   100   093   023    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0012   080   080   001    Old_age   Always       -       13227
 10 Spin_Retry_Count        0x0026   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0013   100   090   020    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   008    Old_age   Always       -       1993
 13 Read_Soft_Error_Rate    0x000b   100   093   023    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   078   073   042    Old_age   Always       -       57
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       295108
196 Reallocated_Event_Count 0x0010   099   099   020    Old_age   Offline      -       1
197 Current_Pending_Sector  0x0032   100   099   020    Old_age   Always       -       1
198 Offline_Uncorrectable   0x0010   100   092   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x001a   156   156   000    Old_age   Always       -       44

Notice there is 1 Current_Pending_Sector. Running hdrecover /dev/hdb
(it happens to be the second IDE drive):

# hdrecover /dev/hdb
hdrecover version 0.2
By Steven Price

Disk is 156355584 sectors big
Sector 28000 (00%) ETR: 1 hours 33 minutes 3 seconds
Sector 75920 (00%) ETR: 1 hours 8 minutes 36 seconds
Sector 121520 (00%) ETR: 1 hours 4 minutes 16 seconds
Sector 169400 (00%) ETR: 1 hours 1 minutes 27 seconds
Sector 217120 (00%) ETR: 59 minutes 55 seconds
Failed to read block at sector 257500, investigating futher...
Error at sector 257505
Attempting to pounce on it...
Attempt 1 from sector    117727958: FAILED
Attempt 2 from sector    149263364: FAILED
Attempt 3 from sector     84429675: FAILED
Attempt 4 from sector     81160097: FAILED
Attempt 5 from sector    156272094: FAILED
Attempt 6 from sector     75028634: FAILED
Attempt 7 from sector     63238892: FAILED
Attempt 8 from sector    116585171: FAILED
Attempt 9 from sector    102184049: FAILED
Attempt 10 from sector      5777672: FAILED
Attempt 11 from sector     81154822: FAILED
Attempt 12 from sector     69534256: FAILED
Attempt 13 from sector     90633165: FAILED
Attempt 14 from sector    150946922: FAILED
Attempt 15 from sector     18909543: FAILED
Attempt 16 from sector     29845152: FAILED
Attempt 17 from sector     24976866: FAILED
Attempt 18 from sector     52496797: FAILED
Attempt 19 from sector    104093022: FAILED
Attempt 20 from sector    127778471: FAILED
Couldn't recover sector
The data for this sector could not be recovered. However, destroying the
contents of this sector (ie writing zeros to it) should cause the hard disk
to reallocate it making the drive useable again
Do you want to destroy the sector? [y/n]:


/---------\
| WARNING |
\---------/

Up until this point you haven't lost any data because of this program
However, if you say yes below YOU WILL LOSE DATA!
Proceed at your own risk!

Type 'destroy data' to continue
destroy data

Wiping sector...
Checking sector is now readable...
Sector is now readable. But you have lost data!
Sector 257520 (00%) ETR: 30 hours 58 minutes 53 seconds
Sector 290700 (00%) ETR: 27 hours 35 minutes 18 seconds

... (loads more status info) ...

Sector 156197020 (99%) ETR: 3 seconds
Sector 156242080 (99%) ETR: 2 seconds
Sector 156287080 (99%) ETR: 1 seconds
Sector 156332200 (99%) ETR: 0 seconds
Summary:
  1 bad sectors found
  of those 0 were recovered
  and 1 could not be recovered and were destroyed causing data loss

*****************************************
* You have wiped a sector on this disk! *
*****************************************
*  If you care about the filesystem on  *
*  this disk you should run fsck on it  *
*  before mounting it to correct any    *
*  potential metadata errors            *
*****************************************
#

So the sector had to be wiped to recover it. fsck was then run to
check the metadata wasn't damaged (it wasn't) and smartctl now returns
the following attribute table:

SMART Attributes Data Structure revision number: 11
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0029   100   100   020    Pre-fail  Offline      -       0
  3 Spin_Up_Time            0x0027   064   063   020    Pre-fail  Always       -       4623
  4 Start_Stop_Count        0x0032   096   096   008    Old_age   Always       -       2882
  5 Reallocated_Sector_Ct   0x0033   099   099   020    Pre-fail  Always       -       5
  7 Seek_Error_Rate         0x000b   100   093   023    Pre-fail  Always       -       0
  9 Power_On_Hours          0x0012   080   080   001    Old_age   Always       -       13227
 10 Spin_Retry_Count        0x0026   100   100   000    Old_age   Always       -       0
 11 Calibration_Retry_Count 0x0013   100   090   020    Pre-fail  Always       -       0
 12 Power_Cycle_Count       0x0032   097   097   008    Old_age   Always       -       1993
 13 Read_Soft_Error_Rate    0x000b   100   093   023    Pre-fail  Always       -       0
194 Temperature_Celsius     0x0022   078   073   042    Old_age   Always       -       57
195 Hardware_ECC_Recovered  0x001a   100   100   000    Old_age   Always       -       295108
196 Reallocated_Event_Count 0x0010   099   099   020    Old_age   Offline      -       1
197 Current_Pending_Sector  0x0032   100   099   020    Old_age   Always       -       0
198 Offline_Uncorrectable   0x0010   100   092   000    Old_age   Offline      -       0
199 UDMA_CRC_Error_Count    0x001a   156   156   000    Old_age   Always       -       44

Current_Pending_Sector is now 0 so the drive is 'fixed' (for
now). Also note that Reallocated_Event_Count hasn't gone up (it was 1
before). This means that the drive now has confidence in the sector
that has been overwritten and has not used a spare sector. Whether you
still have confidence in the drive is another matter!
pLuster/hdrecover