Introduction and Internals
Could the Western Digital’s new Red be in your next NAS?
Introduction:
I'm going to let the cat out of the bag right here and now. Everyone's home RAID is likely an accident waiting to happen. If you're using regular consumer drives in a large array, there are some very simple (and likely) scenarios that can cause it to completely fail. I'm guilty of operating under this same false hope – I have an 8-drive array of 3TB WD Caviar Greens in a RAID-5. For those uninitiated, RAID-5 is where one drive worth of capacity is volunteered for use as parity data, which is distributed amongst all drives in the array. This trick allows for no data loss in the case where a single drive fails. The RAID controller can simply figure out the missing data by running the extra parity through the same formula that created it. This is called redundancy, but I propose that it's not.
Continue on for our full review of the solution to this not-yet-fully-described problem!
Since I'm also guilty here with my huge array of Caviar Greens, let me also say that every few weeks I have a batch job that reads *all* data from that array. Why on earth would I need to occasionally and repeatedly read 21TB of data from something that should already be super reliable? Here's the failure scenario for what might happen to me if I didn't:
- Array starts off operating as normal, but drive 3 has a bad sector that cropped up a few months back. This has gone unnoticed because the bad sector was part of a rarely accessed file.
- During operation, drive 1 encounters a new bad sector.
- Since drive 1 is a consumer drive it goes into a retry loop, repeatedly attempting to read and correct the bad sector.
- The RAID controller exceeds its timeout threshold waiting on drive 1 and marks it offline.
- Array is now in degraded status with drive 1 marked as failed.
- User replaces drive 1. RAID controller initiates rebuild using parity data from the other drives.
- During rebuild, RAID controller encounters the bad sector on drive 3.
- Since drive 3 is a consumer drive it goes into a retry loop, repeatedly attempting to read and correct the bad sector.
- The RAID controller exceeds its timeout threshold waiting on drive 3 and marks it offline.
- Rebuild fails.
At this point the way forward varies from controller to controller, but the long and short of it is that the data is at extreme risk of loss. There are ways to get it all back (most likely without that one bad sector on drive 3), but none of them are particularly easy. Now you may be asking yourself how enterprises run huge RAIDs and don't see this sort of problem? The answer is Time Limited Error Recovery – where the hard drive assumes it is part of an array, assumes there is redundancy, and is not afraid to quickly tell the host controller that it just can't complete the current I/O request. Here's how that scenario would have played out if the drives implemented some form of TLER:
- Array starts off operating as normal, but drive 3 has developed a bad sector several weeks ago. This went unnoticed because the bad sector was part of a rarely accessed file.
- During operation, drive 1 encounters a new bad sector.
- Drive 1 makes a few read attempts and then reports a CRC error to the RAID controller.
- The RAID controller maps out the bad sector, locating it elsewhere on the drive. The missing sector is rebuilt using parity data from the other drives in the array.
- Array continues normal operation, with the error added to its event log.
The above scenario is what would play out with an Areca RAID controller (I've verified this personally). Other controllers may behave differently. A controller unable to do a bad sector remap might have just marked drive 1 as bad, but the key is that the rebuild would be much less likely to fail as drive 3 would not drop completely offline once the controller ran into the additional bad sector. The moral of this story is that typical consumer grade drives have data error timeouts that are far longer than the drive offline timeout of typical RAID controllers, and without some form of TLER, two bad sectors (totaling 1024 bytes) is all that's required to put multiple terabytes of data in grave danger.
Update: These are now for sale on Newegg.com!
The Solution:
The solution should be simple – just get some drives with TLER. The problem is that until now those were prohibitively expensive. Enterprise drives have all sorts of added features like accelerometers and pressure sensors to compensate for sliding in and out of a server rack while operating, as well as dealing with rapid pressure changes that take place when the server room door opens and the forced air circulation takes a quick detour. Those features just aren't needed in that home NAS sitting on your bookshelf. What *is* needed is a Caviar Green with TLER, and Western Digital aims to deliver that, among other things:
For this review I have assembled all of the usual suspects that I've seen folks put into their home arrays (myself included). We have all three flavors of the Caviar Green (6Gb/sec, 3Gb/sec, and the AV-specific one meant for DVRs). I've also tossed in the Caviar Black and for the rich folks out there the enterprise grade RE4-GP and its faster spinning RE4 cousin. The 1TB VelociRaptor is included to give us some high-end perspective.
The Dirty Dozen (well, 2/3rd of it anyway).
Internals:
Sorry I won't be cracking this guy completely open, but here are some shots of what was accessible, including the back of the PCB:
A fast 6Gb/sec Marvell controller is assisted by 64MB of Samsung DDR2 cache. This should give us some good burst figures for sure.
Moving right along…>
Areca cards are great!
Areca cards are great! However very pricey. Which is why I was I posted the question above your post.
BUT as a person who owns one of these cards, I’d move away from it. Why? ZFS is a better storage system. Grab a AMD cpu (because it supports ECC), get ECC ram, and load a OS that supports ZFS. From what I hear it is a much more reliable storage system than RAID.
I would have done this myself, but I didn’t learn about ZFS until after my purchase. Secondly, I have nearly 6TBs across 8 drives. Moving that amount of data would be a pain.
Lastly, if you really decide to go with an Areca card. Try to find them second hand. I picked up my 1231ML for $375 used. Run a google search on the key phrase : “FS Areca”, and sorta by date.
Good luck!
Do any of you see a problem
Do any of you see a problem using these drives in a 12 bay NAS running FreeNAS? ZFS/2
The WD site just says for up to 5 bays.. Is this just marketing hype> Or do you think these drives will be OK for large bay NAS enclosures?
Thought? Thanks
As I understand it, It’s
As I understand it, It’s because the RED drives lack vibration sensors and pressure sensors.
However, I’m also speculating in using 15 of these babies in a file server for private use.
I’m really wondering if this will actually be a problem or not….
-JKJK-
As I understand it, It’s
As I understand it, It’s because the RED drives lack vibration sensors and pressure sensors.
However, I’m also speculating in using 15 of these babies in a file server for private use.
I’m really wondering if this will actually be a problem or not….
-JKJK-
Hi,
I would really love to
Hi,
I would really love to hear more about your 15 drive setup. Care to share some more details?
Thanks,
-jj
This apps game is a toy that
This apps game is a toy that lets you play kitchen with your child.
The unlocked version is the same as those that have
already hit the market, but they do not come with an SIM card.
This will appeal to young adults in their
early 20’s and even teenagers who often beg for an i – Phone as their birthday gift.
my webpage :: chatroulette-no-survey-2
Sorry about that triple post
Sorry about that triple post … got a “page could not be found for each time I tried to post”.
A while back WD Red 3TB was
A while back WD Red 3TB was selling for $169.99 now that I want to buy it is about $259.99 any idea if the prices would drop to below $200 and why the sudden increase in price?
Would you recommend the RED
Would you recommend the RED series if you don’t use a RAID solution. Why, i’ll running Windows Home Server 2011 that have 4 drives and i’ll make (lessons learned) a robocopy once a week to een external esata drive of my important DATA.
But it’s more than that
But it’s more than that wooden spoon was the longest, gentlest hug I could give. A common misperception of me is still haunted by that idea. Narcolepsy is estimated to affect between 200 and 500 people per million and is a perfect 12-0 io first round matches. Somehow, amidst the bedlam, Wales retained the composure how to smooth cellulite on thighs required to win another Six Nations title. If I put the dirty dishes in thehatch, then went into the match.
My webpage: cellulite treatments before and after [howtogetridofcellulitefast.info]
Hey exceptional website! Does
Hey exceptional website! Does running a blog such as this take a massive amount work? I have no understanding of programming however I was hoping to start my own blog soon. Anyways, should you have any suggestions or tips for new blog owners please share. I understand this is off topic but I simply had to ask. Thanks a lot! Rufus Erbes
I know this if off topic but
I know this if off topic but I’m looking into starting my own blog and was curious what all is required to get set up? I’m assuming having a blog like yours would cost a pretty penny? I’m not very web savvy so I’m not 100% sure. Any recommendations or advice would be greatly appreciated. Kudos Deon Vanderweerd
“I have an 8-drive array of
“I have an 8-drive array of 3TB WD Caviar Greens in a RAID-5.”
This array is just a disaster waiting to happen. Consider the situation when one of your drives fails: you’re left with 7-drive RAID 0 array! And if the thought of housing your important data on such a large RAID 0 array doesn’t make your pulse race, you’re either an idiot (sorry) or the data wasn’t that important to begin with.
When you replace the failed drive of an (n-1)-drive degraded array, the rebuild process places the greatest strain on the array when it is most vulnerable. It may seem counterintuitive, but at some point an n-drive RAID 5 array is more likely to fail and suffer complete data loss than a single large disk with no redundancy. I think n=8 is well beyond that point. This has actually been studied scientifically here: http://media.netapp.com/documents/rp-0046.pdf
Conclusion, large RAID 5 arrays are not safe. You need at least a RAID 6 configuration.
As for whether TLER actually improves the safety of the array, I think some of the other comments have covered this already.
Hi,
I had some green disks in
Hi,
I had some green disks in my NAS and had to do the update with wdidle3 to come over the 8 sec park issue. Is this same fix also needed on these drives?
I find it interesting (and
I find it interesting (and very depressing and confusing) that the only Raid storage devices listed on WD’s certified Red drive compatibility list are devices that previously included Green drives in the manufacturer’s own compatibility lists.
These include Drobos, and the Synology, QNAP, et al software Raid NAS boxes. It is my understanding that the NAS boxes all use Linux MDADM under the covers of their proprietary user interfaces. It is also my understanding that Synology SHR, for example, is simply a well built user interface over Linux LVM.
Suspiciously absent from the list are any of the hardware Raid controllers and devices that use various hardware Raid controllers.
If the above is factually correct then it calls into question the true value of Red drives. We can argue the merits of using Red drives in hardware Riad solutions, but the fact is that if you do have a problem, and you attempt to get support from your Raid controller maker, he will give you a simple response: “We don’t support Red drives so we cannot tell you why your Raid array dropped (or regularly drops).
Seems to me there is little or no value in attempting to use Raid for an increased level of protection if the controller maker will not support it. It is, in that way, no better than using Green drives.
It calls into question the value of Red drives. I’ve read a lot of discussion (mostly speculation) about these new drives but never seen my concerns mentioned or discussed.
As an afterthought, I use a
As an afterthought, I use a SansDigital TR4UTBPN 4 bay eSata/USB3 external Raid/JBOD enclosure. I know, having read their now defunct support forum for many years, that in every case where someone reported Raid array failure or frequent rebuilding problems that they washed their hands of the matter by simply pointing out that they don’t support those drives in that enclosure (or any other Raid enclosure they well). You are totally on your own.
They are not listed on WD’s Red drive compatibility (lists as are no other hardware Raid devices that previously only recommended or certified Enterprise drives). Nor do they specifically address Reds on their site (and their HCL list is not easy to find).
I recently submitted a support ticket asking if they supported Red drives. They initially just said “no, they have never been tested”.
When I persisted and asked *why* they have not tested those drives, they responded by saying that because WD did not include them on their list they had no interest in testing them. I found that a strange response- does the tail wag the dog or the dog wag the tail?
Anyway, since I have no interest in using non-supported drives in a Raid device, I will continue to run the box in JBOD mode, as I always, have for the same reason, with the Green drives I currently use.
IOW, nothing has changed, except we consumers can now speculate about all the various vague claims by the various manufacturers of hard drives and the boxes that use them, and how the Raid system might respond to bad sectors.
A sad state of affairs.
Can you persistantly patch
Can you persistantly patch the red drives to disable tler or change its setting for desktop use?
Great review Allyn !
Very
Great review Allyn !
Very informative.
Bought 2 WD20EFRX (2TB)
Seriously. This is a piece of
Seriously. This is a piece of crap drive. I just lost all my data after 2 weeks old. Trying to recover and the whole drive is in raw and extremely slow trying to access it.
Was good for the first week. Then boom!!!! Instant poop.
Really the Reds seem to be
Really the Reds seem to be Greens rebranded with new firmware/ram.. Unless you are using hardware RAID (and running 24/7) they are not worth it… Go up to enterprise (actual enterprise) drives and never look back..
Also people with early failures aren’t stressing their builds before putting drives etc into production.. I’ve never lost data due to early failure.. I’ve lost a green due to head parking though in a NAS.. WDIDLE is a must for greens – and probably reds)..