Replacing a failed disk in a Linux MD RAID

Introduction

Replacing a failed disk on hardware RAID is easy - you pull the disk with the blinking light, stuff in a new one, and the array rebuilds automatically. Software RAID is a little different.

Case Study

I came into work one morning to see an email from one of our servers. One of the disks had failed overnight, and MDADM had kindly emailed me to let me know of the array's degraded state. This server is running two 72GB SCSI disks in a RAID1. Each disk has one partition on it. (I use LVM to break up my filesystems, so I only have to deal with mirroring one partition.)

Since people were actually using the server, taking it down to replace the disk was out of the question. Not only that, but we'd paid good money for a hot-pluggable backplane in the server. I didn't want to leave the machine running without any redundancy for a signifigant amount of time, so I decided to hot-swap the failed disk.

Preparation - Determine the disk model

The first thing you need to do is determine the model of disk that has failed. All models of disk have slightly different capacities, and you can only replace the failed disk with one that has the same or larger capacity (the partitions that are mirrored must be exactly the same size, so you have to have enough room on the disk to create a partition the same size as the one that is on the good array element.)

Identify the failed array element

The first step is to positively identify the failed disk, since pulling the wrong disk would be rather catastrophic. cat /proc/mdstat and look for the disk. In the case of our example server, /proc/mdstat looks like this:

Personalities : [raid1]md0 : active raid1 sda1[0] sdb1[2](F)
    71681920 blocks [2/1] [U_]

unused devices:

As you can see above, the (F) indicates that sdb1 has failed and needs to be replaced. Since we only have one partition per disk, this is fairly easy to do.

Identify the failed disk

The next step is to determine which physical disk has failed. All of our servers have hot-pluggable backplanes, and unless one has been messing with the SCSI ordering, slot 0 is /dev/sda, slot 1 is /dev/sdb, etc. This should give you a reasonable indication of which disk should be replaced. (in our example, /dev/sdb is the second disk.) Another good indicator is the activity light on the drive. The failed drive will show no activity, while the other drive will show brief flickers as users access data. Now that we know for sure which disk to pull, we can start the process. However, unlike hardware RAID, we cannot simply pull the drive from the server. The kernel needs to be informed that the disk will be removed first, otherwise it will probably crash.

Remove the failed disk from the array

The first step is to remove the failed element from the array. Since it is already marked as failed, it is a simple matter of running mdadm -r . In our example, this is the command I used:

mdadm /dev/md0 -r /dev/sdb1

Tell the SCSI subsystem to offline the disk

Now, we have to tell the SCSI driver in the kernel to take the disk offline so it can be removed. This is done by sending commands to /proc/scsi/scsi. The first step is to determine the host controller, channel, ID, and LUN of the device you want to remove. In our example, /proc/scsi/scsi looks like this:

Attached devices:
Host: scsi0 Channel: 00 Id: 00 Lun: 00
    Vendor:    SEAGATE    Model: ST373307LC    Rev: DS09
    Type:    Direct-Access		ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 01 Lun: 00
    Vendor:    SEAGATE    Model: ST373307LC    Rev: DS09
    Type:    Direct-Access		ANSI SCSI revision: 03
Host: scsi0 Channel: 00 Id: 06Lun: 00
    Vendor:    SDR        Model: GEM318P    Rev: 1
    Type:    Processor			ANSI SCSI revision: 02

So, /dev/sdb is on Host 0, Channel 0, ID 1, LUN 0 Now, we echo a string to /proc/scsi/scsi to mark the disk offline. Here's what I used for our example:

echo "scsi remove-single-device 0 0 1 0" > /proc/scsi/scsi

Now, when you cat /proc/scsi/scsi, you'll see the device is no longer listed. You can now pull the disk from the chassis. Insert the replacement disk in the same slot.

Bring the disk online

Now, we can bring the disk online. Here's what I did for our example:

echo "scsi add-single-device 0 0 1 0” > /proc/scsi/scsi

The kernel will bring the disk online and spin it up.

Partition the new disk

Use cfdisk to create a new partition table to match what’s on the other disk in the mirror set. Example:

cfdisk -z /dev/sdb

Create a primary partition, exactly the same size as the one on the other disk (or the mirror will not re-establish), type FD (Linux Raid Autodetect), and bootable. Write the partition table and exit cfdisk.

Add the disk back to the array

Now, you use mdadm --add to hot-add the device back to the RAID. Example:

mdadm /dev/md0 --add /dev/sdb1

MDADM will put the device back into the array, and start rebuilding the mirror. cat /proc/mdstat to see the status of the mirror rebuild.