RAID 101A quick RAID primer
A Redundant Array of Independent Disks, or RAID, is a means of combining multiples of hard disks together in a single group, or array. The ‘I’ originally stood for Inexpensive, and it still applies today but only in the relative sense as a single drive matching the capacity of modern arrays is Unobtanium.
For the purposes of this article I will be sticking with RAID 5 and 6. There are other possibilities, but these are the most prevalent for mass storage purposes.
(from left to right) RAID 5 and RAID 6 span data across all disks in even ‘stripes’.
I’ve used some form of RAID-5 or 6 for my personal mass storage needs for some time now. For those unfamiliar, RAID-5 spreads all data across the installed drives in such a way that any one drive can fail and data integrity can be maintained. RAID-6 does the same, but doubles up on the redundant data such that any 2 drives can fail. There is no free lunch to this redundancy, and you must donate 1 or 2 drives worth of capacity to the cause. Lower end RAID hardware (and especially motherboard RAID) takes a performance hit on write speeds, as the extra parity data must be calculated on the fly. Consumer motherboard RAID controllers perform these calculations by use of the host system CPU, negatively impacting system resources. Although a modern CPU is very fast, it is typically not as efficient as the dedicated XOR (parity) calculation engines present on modern RAID cards.
RAID hardware, through either an ‘option ROM’ or through a special driver interface in Windows, allow a user to create a volume from a given set of attached hard drives. The volume appears to the OS as if it were a single drive whose capacity is determined by the installed drives. Most controllers support some form of capacity expansion, where drives can be added to an array at a later date. This process is called ‘migration’. Since RAID hardware is not data aware, migrations require the card to process *all* sectors on *all* drives of the array, rearranging the data front-to-back and redistributing it across the added drives. Depending on the implementation, this is one of the riskiest operations one can perform on an array. Some drives may have developed bad sectors over time and the rebuild may be the first time those sectors have been read in some time. Unreadable sectors, especially over multiple drives, can result in drives ‘dropping out’ of the array mid-migration, and ultimately may cause loss of the entire array with almost no hope of successful data recovery. There are ways around this, such as occasionally verifying all array data (called ‘data scrubbing’), which helps as bad sectors can be found and corrected with the use of the redundant array data present.
An example of RAID configuration via Option Rom.
A normally functioning array withstands drive failures by dropping into a ‘degraded’ mode of operation. The default for most arrays is to keep the data available (and modifiable) to the user even while in this mode. The array will not perform as well since the missing drive data must be calculated from the extra parity info spread among the other drives of the array. Each RAID manufacturer has their own ideas on how to inform the user of a drive failure. Almost all hardware RAID cards have an on-board buzzer. Higher end hardware (i.e. Areca) have a dedicated Ethernet port at the rear and can send warning emails directly over the network, regardless of the host Operating System. Motherboard RAID relies on either the driver itself issuing a warning through the OS, or in the case where the full driver was not installed, the user has to be lucky enough to see the warning flashed by during a system boot. Once the user receives the warning, the failed drive is removed and replaced. Once done, the array begins a ‘rebuild’ process, where the missing pieces are recreated from the parity data present on the other drives. Like the rebuild process, this requires every drive to be read ‘front to back’, and the new drive is written in the same manner. Once the rebuild is complete, the array returns to normal operation.
One thing commonly overlooked is that while a RAID-5 is degraded, the array is *extremely* vulnerable. Any additional failure will result in loss of the entire array, and the probability of failure becomes a multiple of the number of remaining drives. A degraded array should be powered down until the drive can be replaced to minimize those chances. Some controllers let you connect an extra drive as a ‘hot spare’ to minimize the time between failure and rebuild. The hot spare immediately takes the place of a failed drive as to help reduce the window of vulnerability. Even with a hot spare, a subsequent failure or read error occurring during the rebuild can still take down the array. This is the primary reason for the emergence of RAID-6, as it keeps the data redundant even after a single failure as well as during the critical rebuild process. Depending on the controller manufacturer and implementation, RAID-5 can be converted to RAID-6, but some require this to take place only by adding another drive to the array.
The spaghetti pictured above was necessary to accomplish a migration of my own personal array.
Reducing from 10 drives to 8 required all 18 (!) to be connected simultaneously.
A final (and significant) point which will become relevant later on is that while most arrays can be expanded by adding drives, it is nearly impossible to ‘contract’ an array without restarting from scratch. Want to move from 8 1TB drives to 6 2TB drives? You’ll need to create a new array, meaning you need enough drives, power and SATA connections to support both arrays simultaneously for the transition. An additional RAID controller may be required if you run out of SATA ports, potentially adding unwanted expense to what was meant to be a simple migration. With an additional RAID controller and a bunch of added power and SATA connections needed, it is likely you would need another complete system just to hold it all, and you would then be bandwidth limited by the link between both machines.