Raid - Everything2.com

What is RAID?

RAID is an acronym used in the computing industry to describe a manner in which multiple hard disks are configured in a computer. The configuration is such that they provide some type of redundancy in the event that one of the disks fails. The acronym itself has historically been to shorten one of the two following phrases:

Redundant Array of Inexpensive Disks

Redundant Array of Independent Disks

As most persons professionally involved in the computer industry, and I’m sure a number of very unhappy home users, can tell you, the hard disks in computers are failure-prone and can have very short life spans. Furthermore, as one of the few components in a computer that has moving parts (hard disks have metal platters that tend to spin between 5,400 and 10,000 times per minute, or about 90-167 times per second), it is prone to mechanical failures that aren’t often seen in other components. Because the disks are so apt to fail, and because they’re the one component of computers that retain dynamic information in between power-cycles, it was deemed necessary to create redundancy. Hard disks themselves have no built-in redundancy: each bit is written to the disk in its proper location and promptly forgotten until it needs to be retrieved again, so it was necessary to create redundancy through an external feature. To that end, RAID was created to allow multiple disks to be used in an array as a backup for one another. It has since been extended to offer a feature called striping, which links more than one drive together as a single volume, with data written to each drive sequentially to balance the load of data. Today, that redundancy is typically implemented in one of two ways: hardware RAID and software RAID.

Any explanation should be prefaced by the fact that the terms "hardware" and "software" are misleading: in the case of software RAID, the control of how data is written is left to the host operating system. In this instance, the operating system is able to see each drive in the array and address each drive individually, but instead of treating each drive as an independent device, the operating system abstracts its relation and uses software to determine how data should be delivered to and from the drives. In the case of hardware RAID, there is a physical device to which the drives connect that contains software that manages the manner in which the data is delivered to and from the drives. As can be seen, in both cases software is used in the delivery of data, and the chief difference is whose software is used.

It is this difference that governs the argument over which type of RAID is more appropriate for use. In general, it is argued that software based RAID, governed by the operating system, is less favorable because the operating system has a large number of tasks to handle in addition to array management. That means that the array has to compete for system resources, including processing time, which can result in the array’s speed being lowered due to its priority being below that of other tasks. Furthermore, since the operating system performs a wide array of functions, it is more vulnerable to failure and damage: a room with ten doors is harder to monitor than a room with one. Hardware RAID, on the other hand, has only a single set of operations it must perform and those operations involve only the hard disks: one door for one set of tasks. Furthermore, as it has only a specific set of tasks whose computational requirements are easily determined, hardware RAID controllers are manufactured with enough horsepower to handle all necessary tasks with capability left over. Being one step removed from the operating system, which must operate with interaction from users, is considered an enormous benefit because there is no way to "accidentally" change or damage the array structure: generally the system must be restarted to access those functions before the operating system loads.

As a note of historical interest, the reason RAID formerly stood for Redundant Array of Inexpensive Disks is that in days passed hard disks were quite expensive. To get a very large hard disk could cost a very large sum of money; the ability to link together multiple smaller disks to create a single large volume meant that a hard disk failure would only constitute a small financial loss. With present-day low prices of hard disks per megabyte, the acronym was changed to Redundant Array of Independent Disks.

What does RAID do?

Specifically, RAID is designed to do three things: provide mirroring between disks, stripe data across disks to equally distribute data, and create parity information for data written across disks, so in the event of disk failure the striped data can be recreated.

The benefits of mirroring are that if a single disk fails there is real-time backup of all data and the system will continue to function normally. This gives a window of time to replace the failed disk and rebuild the mirror (which means to have the data from the surviving disk written to the new disk to re-establish mirroring). In addition, the speed of all read operations is improved to x times the speed of a single disk (let the number of disks equal x).

The benefits of striping are that two or more disks of size n can be linked together to create a new volume of size (n * x). In addition, the speed of all disk operations is improved by a factor of x.

The benefits of parity are such that in the event of any disk failure, it is possible to rebuild the entire array even if some data was written only to a disk that failed. Parity information is calculated such that it is possible to recreate lost data using parity data and remaining data on other drives. Speed of arrays with parity is generally lower than single disks because data parity must be calculated for each write, though this is not always the case as we will see later.

RAID restrictions

RAID functions best when the drives used in the configurations are the same type, speed and size. Differences in these characteristics can dramatically affect the speed of the array.

What types of RAID are there?

There are multiple types of RAID. Discussed so far have only been the most common. Below is a comprehensive list of the types of RAID. Only the most common will be fully explicated.

RAID 0 – This type of RAID is a striping array and requires a minimum of two drives to implement. In this configuration, data is written sequentially in equal parts, called blocks, to all drives in the array, maximizing the space used on all drives, and read from the drives simultaneously to increase the read speed.

Disk speed is maximized in this configuration. Because of this it is favored for applications that require very fast response times, such as video editing and databases.

Because data is only striped across disks, this is not considered true RAID: the data is not redundant and the failure of any drive in the array will result in the loss of all data on all drives.

Drives used in this type of array must be of the same size; differences in size will result in the size of each disk in the array being limited to the size of the smallest drive.
RAID 1 – This type of RAID is a mirroring array and requires a minimum of two drives to implement. In this configuration, data is written simultaneously to both drives and read from all drives simultaneously to increase the read speed.

Disk read speed is maximized in this configuration, though the write speed is equal to the speed of a single disk. Since the data is mirrored, the failure of any single drive in the mirrored array does not affect the integrity of the data.

Drives used in this type of array must be of the same size; differences in size will result in the size of the array being limited to the size of the smallest drive.
RAID 2 – This type of RAID is not commercially implemented by any manufacturer of RAID software or hardware. The array itself consists of two sets of arrays: one which has the data recorded, and a second which records ECC data for the data written to the first array. When the data is read, its ECC information is read as well to do on-the-fly error correction.

Disk speed can be very high, but the number of disks required to implement this type of setup is large.
RAID 3 – This type of RAID is a striping parity array and requires a minimum of three disks to implement. In this configuration, data is striped across a minimum of two disks with parity information generated and stored on a third disk.

Disk speed is maximized in this configuration, and single disk failure does not result in the loss of data as the parity information can be used to repair the array with the addition of a new disk.

Disks used in this type of array must be of the same size; differences in size will result in the size of the array being limited to the size of the smallest drive.
RAID 4 – This type of RAID has each drive in the array act independently with parity data written to an additional drive and requires a minimum of three disks to implement. In this configuration, data is neither striped nor mirrored but does have parity information generated for the recovery of any single disk failure.

Disk speed is limited to the size of any single disk with additional overhead for the generation of parity data.

Disks used in this type of array may be of different sizes as each drive acts independently of the other drives.

This type of array is rarely implemented.
RAID 5 – This type of RAID is a striping parity array and requires a minimum of three disks to implement. In this configuration, data is striped across all disks in the array and the generated parity information is striped across all disks in the array. This is the most common and general RAID level due to its high level of redundancy.

Disk read speed is maximized in this configuration and write speed is improved as all data is striped across the disks. This array can sustain disk failure and continue functioning due to the parity data.

Drives used in this type of array must be of the same size; differences in size will result in the size of the array being limited to the size of the smallest drive.
RAID 6 – This type of RAID is a striping parity array and requires a minimum of (x + 2) drives to implement. In this configuration, data is striped across all disks in the array and the generated parity information is striped across all disks in the array. In addition, a second set of parity data is generated and striped across all disks in the array.

This type of array is identical to RAID 5 with the exception that an additional parity set is created for all data. In this configuration the array can sustain multiple disk failures while retaining data integrity.

This type of array is generally recommended for mission-critical applications where the possibility of multiple disk failure is not acceptable. In general, RAID 6 will be slower than RAID 5 because of the doubled computational requirements of the parity data generation.

Drives used in this type of array must be of the same size; differences in size will result in the size of the array being limited to the size of the smallest drive.
RAID 10 – This type of RAID is a striping mirrored array and requires a minimum of four disks to implement. In this configuration, data is striped across two disks (in a four disk configuration), and the data striped is then copied to an independent mirror consisting of two disks (the data is written in the stripe, then written to one of the two remaining disks; when written to one of the two remaining disks, it is then mirrored—in this configuration, there is a stripe set and a mirror set).

This type of array has the same level of fault tolerance and speed as RAID 1. This type of configuration is recommended for systems requiring the speed of striping but with the fault tolerance of mirroring that data.

Drives used in this type of array must be of the same size; differences in size will result in the size of the array being limited to the size of the smallest drive.
RAID 50 – This type of RAID is a striping array with parity information stored on a separate disk. The parity information is then striped on separate disks. Because of this, it requires a minimum of five disks to implement and is similar to RAID 3.

This type of array is rarely implemented because of poor use of space.
RAID 0+1 – This type of RAID is a mirrored striping array and requires a minimum of four disks to implement. In this configuration (four disks), data is striped across two disks for the benefits of RAID 0, which is then mirrored on two additional disks for the redundancy of RAID 1.

This type of array has the same level of fault tolerance of RAID 5, but has the capacity of one-half the total capacity of the disks used.

RAID 0+1 should not be confused with RAID 10; disk failure in this array will result in the array being the equivalent of a RAID 0 array.
JBOD – This is not true RAID, and stands for "Just a Bunch of Disks." In this configuration, which requires a minimum of two disks, data is written to each disk in the array until that disk is full. Once that disk is full, data is written to the next disk in the array. This mode of writing is referred to as linear append. It offers no redundancy and no speed improvements. In addition, disks may be of different sizes without limiting the total capacity of the array.

Foolproof method to determine how much a person knows about computers	Dream of the Fisherman's Wife	RAID 0	RAID 5
RAID 1	That whole bedroom thing wouldn't have happened if you hadn't tried to explain Quantum Physics	Parity	SCSI
Newspeak	RAID 3	Registered Ports: 1024-2047	Disk Duplexing
Beowulf	The Perfect Geek Compound	Hamming code	RAID 4
Shadowing	Artists Against Piracy	pr0n	Striped
inexpensive	Disk Drive Breakdown	overclocking	One