Background, Read Ignoring ECC, Writing = BAD!

Background

Ok, there’s a lot of background you all need to be filled in on as to just why a device like this is important. I won’t be spoon feeding you the gospel of data recovery places, this is all information that I personally agree with and is only confirmed by what many others have said. The short version is that when a drive of questionable condition contains vital information that is imperative to be recovered, software tools alone are just not sufficient for a hardware data recovery. Allow me to explain:

Most of the time, when a drive has some issues that prevent a system from booting, those issues are bad sectors. Unreadable sectors lead to timeouts at stages where the system is not the best at handling them. Since Operating Systems are very poor, or even harmful, at dealing with read timeouts in critical file system areas, using a tool like SpinRite to force rewrites can potentially get you back up and running, but there is a chance of losing some data or corrupting some file system pointers in the process. SpinRite (and other similar tools – not trying to single it out here, but it's arguably the most popular) can only rewrite their 'best guess' of the bad sector contents back to the drive, and since they are not file system aware, they have no idea what type of data they are rewriting. File pointers corrupted by this process may be silently disregarded by the OS, meaning you’d likely not realize some files or directories went missing until it was too late (contents were later overwritten). As someone with a formal background in this stuff (hard drive forensic analysis for the Navy), I can say that when you really care about the contents of a drive of questionable condition, writing *anything* back to the drive is the worst thing you can do! It is just not something you do when you have precious data on a potentially failing drive. If the data is not that important and you’re just trying to get your kids laptop back up and running, go for it, but if the data is absolutely critical to you, ALL STOP!

SpinRite is a great tool for maintenance and forcing a drive to map out a few bad sectors, but I don't recommend it for the recovery of vital data from potentially failing drives because it relies on that very same failing drive to store anything it has recovered.

…and by no writing, I mean no mounting either, especially under Windows or OSX. Resist all temptation to just toss that failing drive into a USB dock and cross your fingers. Both operating systems handle mounting file systems in different ways, but both do so in a manner that can further damage a drive that is already on its way out. Windows flips a few ‘dirty bits’ during the mounting process and it has to write to the drive to do so. Further, each file read from some partition types results in an update to the ‘Last Accessed’ tag associated with that file. There goes a bunch of other potential writes. Even if you were successfully reading files, suspected bad heads could be corrupting file table update writes in the process.

Read Ignoring ECC

Before moving on, a quick note about software tools attempting to read from drives in a manner that ignores ECC. Some of these tools can repeatedly attempt to read a bad sector while instructing the drive to ignore its own error correction data (ECC). The idea is that if you can get the 'raw' data from the drive, you can statistically analyze repeated attempts and deduce the most likely content of that sector. While that worked well for most drives back in 2004 (when SpinRite 6.0 was released), many modern drives respond to these requests with data that has nothing to do with the contents of the sector being read:

As you can see in the video above, there are plenty of cases that may throw off SpinRite's DynaStat engine by feeding it false information, and since SpinRite is not an imager, it is forced to write that new data back to the same sector, potentially corrupting that sector with irrelevant information it mistakenly thought was good data. *EDIT* I was able to circle back with Steve Gibson (creator of SpinRite), and confirm that pre-run checks are performed so that DynaStat only kicks in for drives that can safely do so.

While DeepSpar's more advanced / older tools can ignore ECC and perform statistical analysis, the newer RapidSpar *does not* support that function. This is because most modern drives no longer handle that command in a manner that yields raw sector data, so it remains an advanced function left up to the user discretion of their more advanced recovery tools.

Writing = BAD!

What’s so bad about writing? If the drive had a fault that resulted in bad sectors, it stands to reason that additional writes (or any additional activity) can lead to more bad sectors. Hard drives have the capability to ‘map out’ unreadable (unstable) sectors, but they typically hold off on doing so until you try to write to a sector previously marked as suspect. You can find a count of suspect sectors the drive has identified by checking the SMART ‘Current Pending Sector’ (C5) value. Writing over a sector marked in such a way causes the drive to shift that address to point to a spare sector elsewhere on the drive. The drive knows which sectors have been swapped out by the use of a ‘Growth’ defect list (G-list for short). This list is stored in a special service area of the platters, next to the other firmware modules of the drive. If the list is on the platters, how do you update it? You guessed it, another write that can potentially be corrupted, but if you mess up a one of these particular writes, the checksum of the associated firmware block (which handles drive IO) fails and you potentially end up with a drive unable to respond to commands the next time it is powered up.

SMART data revealing a sector that was previously unreadable. This sector will be remapped on its next (unlikely) successful read. If this sector is overwritten, the drive will map it out and begin using a spare in its place. Once that has happened, the original sector/data can only be read using very special tools and techniques.

Now that we've driven home the need to prevent writes to flaky drives, I should point out that there are does exist software tools that limit their functionality to only reading from a source drive. One such tool is Ddrescue, which runs under Linux and can avoid the Windows/OSX mounting issues mentioned above. Ddrescue is handy in that it can skip over areas containing bad sectors in favor of good ones, performing multiple passes in an attempt to get a more complete image in the least amount of time possible. There are still a couple issues with this approach. First, Ddrescue is not file system aware, meaning it would spend a lot of time attempting to image unused / irrelevant areas of the source drive. Second, and perhaps the most important factor, is that all software tools must all rely on the drives' own timeout for bad or slow read attempts.

A graphical representation of Ddrescue skipping around bad areas of a disk.

Since mechanical devices tend to degrade further after the first few signs of trouble, realize that with many data recovery efforts, you may be operating on borrowed time (one such case here). Software-based imaging tools are unable to perform a critical function for speeding up the dealing with those bad or slow sectors, as they cannot issue the hardware-based Reset command. Only dedicated recovery hardware can do this, which means all software tools must rely on the drives’ own timeout to occur for every single read attempt, a process that can take longer than 20 seconds *per sector*. Multiply that out and some drives would take weeks or months to image. One of my previous software image attempts took a week to reach 1%, and that was only a 400GB drive! That same drive later failed completely. If I had access to a better tool at that time, I would have recovered far more of that drive before it failed, easing my recovery efforts.

« PreviousNext »