RAID Levels, Data Striping, Disk Mirroring and Parity Fault Tolerance

RAID Levels, Data Striping, Disk Mirroring and Parity Fault Tolerance

RAID – Redundant Array of Independant Disks
RAID is a hard disk technology which can be used to speed up data transfer and/or provide disk redundancy. RAID provides these features by utilising more than one hard disk at a time, there are several variations of a RAID configuration referred to as levels. Each of these levels provide different performance and/or fault tolerance benefits. Below are some RAID configurations with a brief description.

RAID 0
This configuration is the fastest of all the RAID levels, it uses a technique called data striping (see below) and requires at least 2 hard disks.

RAID 1
This level uses a pair of hard disks at a time to provide fault tolerance (there is no performance benefit) and requires at least 2 hard disks.
Using a technique called disk mirroring (see below) the same data is written to both disks at a time, so if one hard disk crashes then the same data is available from the remaining hard disk.

RAID 2
The RAID 2 configuration uses data striping (see below) and a fault tolerance technique called parity (see below), it requires at least 3 disks.
Two (or more) of the disks are used to store the data and one disk is used to store the parity (see below) information.
RAID 2 strips the data into bits which is why RAID 3 (below) is a better implementation as it strips the data into bytes instead.

RAID 3
A common RAID level similar to RAID 2 except that the data is striped into bytes and not bits, giving a performance benefit over RAID 2.

RAID 4
RAID 4 strips the data into blocks and uses a parity drive for fault tolerance, at least 3 drives are required, not a commonly used implementation.

RAID 5
A popular RAID configuration utilising at least 3 drives.
Data is striped across the drives in bytes, the parity data for one particular drive is stored on another drive allowing the data to be rebuilt using the parity technique.

Data Striping
Data Striping is a technique for writing and reading data sequentially to/from more than one storage device.
Before the data is written it is broken up into blocks, these blocks vary in size depending on the RAID configuration (level) used.
The blocks of data are then written sequentially to all disks simultaneously into areas called stripes.
Because all of the read/write heads are working all the time it increases performance as opposed to writing/retrieving data to/from one disk at a time.

Disk Mirroring and Disk Duplexing
This technique does exactly what it sounds like, it creates a mirror image of the information of one disk onto another.
The data is written to both disks at the same time, if one disk fails then the mirror disk can be utilised immediately to provide requested data and/or restore the lost data.
Disk Mirroring techniques usually use one disk controller for both drives, this can lead to problems, if for some reason the disk controller has a failure then none of the disks would be available.
Disk Duplexing gets around this problem by using a separate controller for each data disk, allowing total disk redundancy.

Parity Fault Tolerance
This is a technique which is fairly simple but very effective.

It works by performing a logical operation on the data as it stores it and writing the result of this operation to either a dedicated disk (RAID 2,3 & 4) or on the main data disks (RAID 5).

The logical operation normally used is XOR (eXclusive OR).

An example, if there were 4 data disks in use called DD1, DD2, DD3 and DD4, and one parity disk PD1, whenever data is written to the drives the logical operation that would be performed is:

DD1 XOR DD2 XOR DD3 XOR DD4

and the result stored in PD1

Then if any of the drives fail the data can be rebuilt by performing a similar calculation, but this time substituting the missing data with the value of PD1.

For example, if we were missing DD2, the calculation would be:

DD1 XOR PD1 XOR DD3 XOR DD4 = DD2 (the missing data)

One of the drawbacks to using this type of fault tolerance is a speed loss due to the overhead of the calculations, it obviously needs to use up some disk space too, but it can be invaluable when it comes to retrieving otherwise lost data.

Leave a Reply

Your email address will not be published. Required fields are marked *