hdrecover --------- Attempts to recover a hard disk that has bad blocks on it. WARNING: A hard disk with bad blocks on it is likely to fail! If you value your data you should get a new hard disk instead of using this program! However, if you can't afford a new hard disk, or just like being reckless with your data then this tool might just help you out! Requirements: * A 2.6 kernel * A dead hard drive (well one with bad blocks - not completely dead) * smartmontools is handy although not strictly needed * root access When to use: First of all, if you haven't already got it - go download smartmontools! (http://smartmontools.sf.net). Then run: smartctl -A /dev/hda Where hda is whichever drive you're interested in. The output should include three attributes that are interesting: * Reallocated_Event_Count This is how many sectors have already been reallocated on the drive. We're hoping to get the hard disk to increase this number! * Current_Pending_Sector The number of sectors that the drive thinks are dodgy. Bear in mind sometimes drives change their mind about whether a sector is bad or not - so this number can go down without a reallocation occuring. * Offline_Uncorrectable This is the number of sectors that the drive has attempted to correct itself, but failed. Running the command: smartctl -t offline /dev/hda should cause the drive to test the sectors and attempt to fix them. Not all drives support this though. So the attribute you're interested in at the moment is Current_Pending_Sector. If this is not 0 then there's something up with your disk. If Offline_Uncorrectable is less than Current_Pending_Sectors then you may want to run an offline test which may fix it, or you can dive straight in and use hdrecover which will test (and attempt to fix) it itself. So run (as root): hdrecover /dev/hda and sit back and wait (or probably you're best off going and doing something else while you wait). If the program comes across a bad sector it will make several attempts at reading the sector. Between each attempt it will randomly seek so as to reposition the head, hopefully getting the data off the disk. If it succeeds then the drive should automatically reallocate the sector. If after several attempts the data still can't be read then the program will give you the option of overwriting the data in the sector. This will force the drive to reallocate the block. WARNING: Overwriting the sector WILL cause data loss (it's fairly obvious really!). Hopefully, since it is only one sector (or a handful if there are more bad sectors on the drive), it won't be too important. But bear in mind that you should at the very least run fsck to check the integrity of the filesystem. Once the program has finished (it takes a long time I'm afraid), a summary will be printed showing how many blocks had errors. If you repeat the smartctl command: smartctl -A /dev/hda you should see that Current_Pending_Sector is now 0 and Reallocated_Event_Count will have risen by the number of sectors the drive decided to reallocate. Offline_Uncorrectable usually doesn't immediately update - you have to either wait for the drive to update it itself or run an offline test which should force the drive to update it. Have fun! DISCLAIMER: Hard disks with bad sectors could fail at anytime - although hdrecover appears to fix the drive you shouldn't trust it! And just to re-inforce the GPL licence: if it in any way breaks your computer then you get to keep both pieces! I accept no responsibility for any damage it causes! ------------------------------------------------------------------------------- Sample run: SMART attribute table before: SMART Attributes Data Structure revision number: 11 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x0029 100 100 020 Pre-fail Offline - 0 3 Spin_Up_Time 0x0027 064 063 020 Pre-fail Always - 4623 4 Start_Stop_Count 0x0032 096 096 008 Old_age Always - 2882 5 Reallocated_Sector_Ct 0x0033 099 099 020 Pre-fail Always - 5 7 Seek_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0 9 Power_On_Hours 0x0012 080 080 001 Old_age Always - 13227 10 Spin_Retry_Count 0x0026 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0013 100 090 020 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 097 097 008 Old_age Always - 1993 13 Read_Soft_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0 194 Temperature_Celsius 0x0022 078 073 042 Old_age Always - 57 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 295108 196 Reallocated_Event_Count 0x0010 099 099 020 Old_age Offline - 1 197 Current_Pending_Sector 0x0032 100 099 020 Old_age Always - 1 198 Offline_Uncorrectable 0x0010 100 092 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x001a 156 156 000 Old_age Always - 44 Notice there is 1 Current_Pending_Sector. Running hdrecover /dev/hdb (it happens to be the second IDE drive): # hdrecover /dev/hdb hdrecover version 0.2 By Steven Price Disk is 156355584 sectors big Sector 28000 (00%) ETR: 1 hours 33 minutes 3 seconds Sector 75920 (00%) ETR: 1 hours 8 minutes 36 seconds Sector 121520 (00%) ETR: 1 hours 4 minutes 16 seconds Sector 169400 (00%) ETR: 1 hours 1 minutes 27 seconds Sector 217120 (00%) ETR: 59 minutes 55 seconds Failed to read block at sector 257500, investigating futher... Error at sector 257505 Attempting to pounce on it... Attempt 1 from sector 117727958: FAILED Attempt 2 from sector 149263364: FAILED Attempt 3 from sector 84429675: FAILED Attempt 4 from sector 81160097: FAILED Attempt 5 from sector 156272094: FAILED Attempt 6 from sector 75028634: FAILED Attempt 7 from sector 63238892: FAILED Attempt 8 from sector 116585171: FAILED Attempt 9 from sector 102184049: FAILED Attempt 10 from sector 5777672: FAILED Attempt 11 from sector 81154822: FAILED Attempt 12 from sector 69534256: FAILED Attempt 13 from sector 90633165: FAILED Attempt 14 from sector 150946922: FAILED Attempt 15 from sector 18909543: FAILED Attempt 16 from sector 29845152: FAILED Attempt 17 from sector 24976866: FAILED Attempt 18 from sector 52496797: FAILED Attempt 19 from sector 104093022: FAILED Attempt 20 from sector 127778471: FAILED Couldn't recover sector The data for this sector could not be recovered. However, destroying the contents of this sector (ie writing zeros to it) should cause the hard disk to reallocate it making the drive useable again Do you want to destroy the sector? [y/n]: /---------\ | WARNING | \---------/ Up until this point you haven't lost any data because of this program However, if you say yes below YOU WILL LOSE DATA! Proceed at your own risk! Type 'destroy data' to continue destroy data Wiping sector... Checking sector is now readable... Sector is now readable. But you have lost data! Sector 257520 (00%) ETR: 30 hours 58 minutes 53 seconds Sector 290700 (00%) ETR: 27 hours 35 minutes 18 seconds ... (loads more status info) ... Sector 156197020 (99%) ETR: 3 seconds Sector 156242080 (99%) ETR: 2 seconds Sector 156287080 (99%) ETR: 1 seconds Sector 156332200 (99%) ETR: 0 seconds Summary: 1 bad sectors found of those 0 were recovered and 1 could not be recovered and were destroyed causing data loss ***************************************** * You have wiped a sector on this disk! * ***************************************** * If you care about the filesystem on * * this disk you should run fsck on it * * before mounting it to correct any * * potential metadata errors * ***************************************** # So the sector had to be wiped to recover it. fsck was then run to check the metadata wasn't damaged (it wasn't) and smartctl now returns the following attribute table: SMART Attributes Data Structure revision number: 11 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x0029 100 100 020 Pre-fail Offline - 0 3 Spin_Up_Time 0x0027 064 063 020 Pre-fail Always - 4623 4 Start_Stop_Count 0x0032 096 096 008 Old_age Always - 2882 5 Reallocated_Sector_Ct 0x0033 099 099 020 Pre-fail Always - 5 7 Seek_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0 9 Power_On_Hours 0x0012 080 080 001 Old_age Always - 13227 10 Spin_Retry_Count 0x0026 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0013 100 090 020 Pre-fail Always - 0 12 Power_Cycle_Count 0x0032 097 097 008 Old_age Always - 1993 13 Read_Soft_Error_Rate 0x000b 100 093 023 Pre-fail Always - 0 194 Temperature_Celsius 0x0022 078 073 042 Old_age Always - 57 195 Hardware_ECC_Recovered 0x001a 100 100 000 Old_age Always - 295108 196 Reallocated_Event_Count 0x0010 099 099 020 Old_age Offline - 1 197 Current_Pending_Sector 0x0032 100 099 020 Old_age Always - 0 198 Offline_Uncorrectable 0x0010 100 092 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x001a 156 156 000 Old_age Always - 44 Current_Pending_Sector is now 0 so the drive is 'fixed' (for now). Also note that Reallocated_Event_Count hasn't gone up (it was 1 before). This means that the drive now has confidence in the sector that has been overwritten and has not used a spare sector. Whether you still have confidence in the drive is another matter!