Board index Linux RAID

Moderator: chandranjoy

RAID - How to configure in Linux?

Postby chandranjoy » Fri Dec 31, 2010 9:13 pm


RAID stands for Redundant Array of Inexpensive Disks. This is a solution where several physical hard disks (two or more) are governed by a unit called RAID controller, which turns them into a single, cohesive data storage block.

An example of a RAID configuration would be to take two hard disks, each 80GB in size, and RAID them into a single unit 160GB in size. Another example of RAID would be to take these two disks and write data to each, creating two identical copies of everything.

RAID controllers can be implemented in hardware, which makes the RAID completely transparent to the operating systems running on top of these disks, or it can be implemented in software, which is the case we are interested in.
Purpose of RAID

RAID is used to increase the logical capacity of storage devices used, improve read/write performance and ensure redundancy in case of a hard disk failure. All these needs can be addressed by other means, usually more expensive than the RAID configuration of several hard disks. The adjective Inexpensive used in the name is not without a reason.

The major pluses of RAID are the cost and flexibility. It is possible to dynamically adapt to the growing or changing needs of a storage center, server performance or machine backup requirements merely by changing parameters in software, without physically touching the hardware. This makes RAID more easily implemented than equivalent hardware solutions.

For instance, improved performance can be achieved by buying better, faster hard disks and using them instead of the old ones. This necessitates spending money, turning off the machine, swapping out physical components, and performing a new installation. RAID can achieve the same with only a new installation required.

* Improved read/write performance in some RAID configurations.
* Improved redundancy in the case of a failure in some RAID configurations.
* Increased flexibility in hard disk & partition layout.

The problems with RAID are directly related to their advantages. For instance, while RAID can improve performance, this setup necessarily reduces the safety of the implementation. On the other hand, with increased redundancy, space efficiency is reduced.

Other possible problems with RAID include:
* Increased wear of hard disks, leading to an increased failure rate.
* Lack of compatibility with other hardware components and some software, like system imaging programs.
* Greater difficulty in performing backups and system rescue/restore in the case of a failure.
* Support by operating systems expected to use the RAID.

RAID introduces a higher level of complexity into the system compared to conventional disk layout. This means that certain operating systems and/or software solutions may not work as intended. A good example of this problem is the LKCD kernel crash utility, which cannot be used in local dump configuration with RAID devices.

The problem with software limitations is that they might not be apparent until after the system has been configured, complicating things.

To sum things up for this section, using RAID requires careful consideration of system needs. In home setups, RAID is usually not needed, except for people who require exceptional performance or a very high level of redundancy. Still, if you do opt for RAID, be aware of the pros and cons and plan accordingly.

This means testing the backup and imaging solutions, the stability of installed software and the ability to switch away from RAID without significantly disrupting your existing setup.

RAID levels:
In the section above, we have mentioned several scenarios, where this or that RAID configuration may benefit this or that aspect of system work. These configurations are known as RAID levels and they govern all aspects of RAID benefits and drawbacks, including read/write performance, redundancy and space efficiency.

There are many RAID levels. It will be impossible to list them all here. For details on all available solutions, you might want to read the Wikipedia article on the subject. The article not only presents the different levels, it also lists the support for each on different operating systems.

In this tutorial, we will mention the most common, most important RAID types, all of which are fully supported by Linux.

RAID 0 (Striping):

This level is achieved by grouping 2 or more hard disks into a single unit with the total size equaling that of all disks used. Practical example: 3 disks, each 80GB in size can be used in a 240GB RAID 0 configuration.

RAID 0 works by breaking data into fragments and writing to all disk simultaneously. This significantly improves the read and write performance. On the other hand, no single disk contains the entire information for any bit of data committed. This means that if one of the disks fails, the entire RAID is rendered inoperable, with unrecoverable loss of data.

RAID 0 is suitable for non-critical operations that require good performance, like the system partition or the /tmp partition where lots of temporary data is constantly written. It is not suitable for data storage.

RAID 1 (Mirroring)
This level is achieved by grouping 2 or more hard disks into a single unit with the total size equaling that of the smallest of disks used. This is because RAID 1 keeps every bit of data replicated on each of its devices in the exactly same fashion, create identical clones. Hence the name, mirroring. Practical example: 2 disks, each 80GB in size can be used in a 80GB RAID 1 configuration.

On a side note, in mathematical terms, RAID 1 is an AND function, whereas RAID 0 is an OR.

Because of its configuration, RAID 1 reduced write performance, as every chunk of data has to be written n times, on each of the paired devices. The read performance is identical to single disks. Redundancy is improved, as the normal operation of the system can be maintained as long as any one disk is functional.

RAID 1 is suitable for data storage, especially with non-intensive I/O tasks.


This is a more complex solution, with a minimum of three devices used. Two or more devices are configured in a RAID 0 setup, while the third (or last) device is a parity device. If one of the RAID 0 devices malfunctions, the array will continue operating, using the parity device as a backup. The failure will be transparent to the user, save for the reduced performance.

RAID 5 improves the write performance, as well as redundancy and is useful in mission-critical scenarios, where both good throughput and data integrity are important. RAID 5 does induce a slight CPU penalty due to parity calculations.

Linear RAID
This is a less common level, although fully usable. Linear is similar to RAID 0, except that data is written sequentially rather than in parallel. Linear RAID is a simple grouping of several devices into a larger volume, the total size of which is the sum of all members. For instance, three disks the sizes of 40, 60 and 250GB can be grouped into a linear RAID the total size of 350GB.

Linear RAID provides no read/write performance, not does it provide redundancy; a loss of any member will render the entire array unusable. It merely increases size. It's very similar to LVM.

Linear RAID is suitable when large data exceeding the individual size of any disk or partition must be used.
Other levels

There are several other levels available. For example, RAID 6 is very similar to RAID 5, except that it has dual parity. Then, there are also nested levels, which combine different level solution in a single set. For instance, RAID 0+1 is a nested set of striped devices in a mirror configuration. This setup requires a minimum of four disks.

Nested RAID Levels:
RAID 01 OR RAID 0+1:

RAID 10 OR RAID 1+0:

These setups are less common, more complex and more suitable for business rather than home environment, therefore we won't talk about those in this tutorial. Still, it is good to know about them, in case you ever need them.

So, let's review what we've learned here. We have four major RAID levels that interest us, each offering different results. The most important parameters are the I/O performance, redundancy and space efficiency.

A few words on the table below:
# devices: this column defines the minimum number of devices required to create such a setup.
Efficiency: this term denotes how "well" the array uses the available space. For example, if the array uses all available space, then its efficiency is equal to the total number of devices used. For instance, a RAID 0 with four 80GB disks will have a total space of 320GB, in other words, 4 x 80GB - or simply: 4 (n).
Attrition: this tells us how many devices in the array can be lost without breaking the functionality of the array and losing data.

Here's a small table showing the differences between the four levels discussed above:

RAID notation:
We also have to talk about how RAID devices are seen and marked by Linux. In fact, compared to hard disk notation, which takes into consideration a lot of parameters like disk type and number, partition type, etc, RAID devices are fairly simple.

RAID devices are marked by two letters md and a number. Example: md0, md3, md8. By themselves, the RAID device names tell us nothing about their type. In this regard, the RAID notation is lacking compared to disk/partition notation.

To be able to get more information about our RAID devices, we need additional tools. We will talk about these tools in greater detail later. For now, here's just a snippet of information.

/proc is a pseudo-filesystem on modern Linux operating systems. The term pseudo is used here, because /proc does not monitor a data structure on the disk; instead, it monitors the kernel. In other words, /proc is a sort of a windows into kernel, providing live information about the system internals, at any given moment.

Many parameters about the operating system can be extracted from different files under the /proc tree. For instance, we can check all the running processes and their memory maps, we can check CPU information, mounts, and more. We can also check the status of our RAID devices.

This is done by printing out the contents of the mdstat file under /proc.
cat /proc/mdstat

If there are any RAID devices present, they will be printed out to the screen (STDOUT).


This is a very important, powerful Linux software RAID management utility. It has no less than seven modes of operation, which include assemble, build, create, monitor, grow, manage, and misc. mdadm is a command-line utility and requires super-user (root) privileges.

Later on, we will use it to manipulate our RAID arrays.

For now, here's a quick example:
Create RAID1:
root@Server:/# mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sda2

What do we have here?
* --create tells mdadm to create a new RAID device.
* --verbose tells it to print information about its operations.
* /dev/md0 is the new RAID device that we want to create.
* --level=raid1 defines the RAID level; in our case, RAID 1 (Mirror).
* --raid-devices=2 specifies how many disks (devices) are going to be used in the creation of the new RAID device.
* /dev/sda1 /dev/sdb1 are the two disks that are going to be used in the creation.


mdadm is a powerful command-line utility to managing RAID devices. It has seven modes of operation, which include assemble, build, create, monitor, grow, manage, and misc.

So let's see how we can use mdadm to achieve what we did above.

Create MD device
Remember the wizard we ran through when we created our mirror array on md0? Well, here it is, in text form:
Let's disassemble the command and see what it means:
mdadm --create --verbose /dev/md0 --level=raid1 --raid-devices=2 /dev/sda1 /dev/sdb1

Further references:
Site Admin
Posts: 283
Joined: Fri Oct 23, 2009 11:19 pm

Return to RAID

Who is online

Users browsing this forum: No registered users and 1 guest