Monday, January 4, 2010

Convert a 2 - disk - Raid1 to a 3 - disk - Raid5 with mdadm

Ok folks, with this post I document the way of how I converted a two disk RAID1 array (2x1.5TB = 1.5TB) into a three disk RAID5 (3x1.5TB = 3TB) array on my home server running ubuntu-server 8.04 LTS. The good thing about the whole story is that I was able to do this without any data loss thanks to the linux-raid-tool-of-god mdadm.
To make a long story short I simply added the third disk of exact same size to the array and extended the filesystem.
OOOOOOkkkkkkaaaaayyyyy it's not that simple but almost:

First a few details on my RAID1 array. I built this one a year ago with two Seagate ST31500341AS 1.5TB disks on the same ubuntu server that we're talking about now (yeah..never change a running...u know). The two disks actually performed quite well so no need to get rid of them, although they needed to be patched with a new firmware right after I purchased them. Maybe you heard about the seagate firmware drama.
But as we all know that size matters and 3TB are more than 1.5TB I decided to buy two additional disks to make myself a happy new year. After working through a thousand of tests and recommendations (at least it felt like a thousand) I got myself a pair of Western Digital WD15EADS 1.5TB Caviar Green disks and fitted them into my server case.
After rebooting I checked wether they are recognized correctly or not:
me@server:~$ cd /dev/disk/by-id/
me@server:/dev/disk/by-id$ ls
total 0
drwxr-xr-x 2 root root 840 2010-01-03 20:10 .
drwxr-xr-x 6 root root 120 2009-12-31 11:09 ..

lrwxrwxrwx 1 root root   9 2009-12-31 11:09 ata-ST31500341AS_9VS0AFRP -> ../../sdb
lrwxrwxrwx 1 root root  10 2009-12-31 11:09 ata-ST31500341AS_9VS0AFRP-part1 -> ../../sdb1
lrwxrwxrwx 1 root root   9 2009-12-31 11:09 ata-ST31500341AS_9VS0DEF3 -> ../../sda
lrwxrwxrwx 1 root root  10 2009-12-31 11:09 ata-ST31500341AS_9VS0DEF3-part1 -> ../../sda1
lrwxrwxrwx 1 root root   9 2009-12-31 11:09 ata-WDC_WD15EADS-00S2B0_WD-WCAVY1287280 -> ../../sdd
lrwxrwxrwx 1 root root   9 2009-12-31 11:09 ata-WDC_WD15EADS-00S2B0_WD-WCAVY1325320 -> ../../sdc
.
.
.
lrwxrwxrwx 1 root root   9 2009-12-31 11:09 scsi-1ATA_ST31500341AS_9VS0AFRP -> ../../sdb
lrwxrwxrwx 1 root root  10 2009-12-31 11:09 scsi-1ATA_ST31500341AS_9VS0AFRP-part1 -> ../../sdb1
lrwxrwxrwx 1 root root   9 2009-12-31 11:09 scsi-1ATA_ST31500341AS_9VS0DEF3 -> ../../sda
lrwxrwxrwx 1 root root  10 2009-12-31 11:09 scsi-1ATA_ST31500341AS_9VS0DEF3-part1 -> ../../sda1
lrwxrwxrwx 1 root root   9 2009-12-31 11:09 scsi-1ATA_WDC_WD15EADS-00S2B0_WD-WCAVY1287280 -> ../../sdd
lrwxrwxrwx 1 root root   9 2009-12-31 11:09 scsi-1ATA_WDC_WD15EADS-00S2B0_WD-WCAVY1325320 -> ../../sdc
As it seems they are. Because I'm no good friend of Mr. Murphy I decided to backup the data from the RAID1 to one of the new disks and put the other one to the array. After creating a partition on /dev/sdd I mounted it and backuped all the data:
me@server:~$ sudo cp -a /mnt/RAID/ /mnt/sdd
did the trick. To my surprise this worked like a charm even for my rsnapshot folder containing incremental backups of half a year. Btw take a look at rsnapshot for incremental backups on local or remote machines. It works on Linux and Windows (Cygwin) as well and is really easy to set up. But back to the RAID. First I created a partition on the third disk:
me@server:~$ sudo fdisk /dev/sdc
I made sure that the new partition had the exact same size (in blocks not in bytes) as the two other ones and that the partition type is fd (linux raid).

Now I started with the funny stuff:
I first unmounted the RAID
me@server:~$ sudo umount /dev/md0
Then I stopped the RAID
me@server:~$ sudo mdadm --stop /dev/md0
Now I created the RAID5 metadata and overwrote the RAID1 metadata
me@server:~$ sudo mdadm --create /dev/md0 --level=5 -n 2 /dev/sda1 /dev/sdb1
At this point I got warnings that the two disks seem to be part of a raid array already and I was asked if I really wanted to proceed in creating the array. I decided yes but you know I had backups ;-). What actually happened then is that I created a degraded two disk RAID5 array. Rebuilding as mdadm calls it took something like 7 to 8 hours. I kept another teminal open and checked the progress from time to time with
me@server:~$ cat /proc/mdstat
After mdadm was finished I mounted the RAID again and was very happy that the data was still there. I was very happy...
Now I added the new disk to the array
me@server:~$ sudo mdadm --add /dev/md0 /dev/sdc1
At this point the third disk is added as a spare device to the array. Growing the array to it's new size was done by
me@server:~$ sudo mdadm --grow /dev/md0 --raid-disks=3 --backup-file=/home/me/raid1to5backup.file
To be honest I gave the --backup-file parameter to be on the safe side if something had gone wrong while growing but somehow the file was never created. When I checked again the output of /proc/mdstat I got a little nervous. It was telling me that the reshape will take something about 13000min at 1412K/sec. No kidding. After some googling I found a nice tweak to speed up the process a bit.
me@server:~$ cat /proc/sys/dev/raid/speed_limit_max
200000
me@server:~$ cat /proc/sys/dev/raid/speed_limit_min
1000
When I changed these values to
me@server:~$ sudo su
root@netzwerker:/home/me# echo 20000000 > /proc/sys/dev/raid/speed_limit_max
root@netzwerker:/home/me# echo 25000 > /proc/sys/dev/raid/speed_limit_min
root@netzwerker:/home/me# exit
I got a significant speed-up to about 5000K/sec. I didn't push any harder because atop showed me that the disks were now around 90% busy. After more than 14 hours I still was at
me@server:~$ cat /proc/mdstat 
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md0 : active raid5 sdc1[2] sdb1[1] sda1[0]
      1465135936 blocks super 0.91 level 5, 64k chunk, algorithm 2 [3/3] [UUU]
      [====>................]  reshape = 24.7% (362362752/1465135936) finish=3174.3min speed=5788K/sec

unused devices: <none>
but heh, haste makes waste so I was patient. Finally after 48 hours the reshaping was finished. Now I did a
me@server:~$ sudo e2fsck -f /dev/md0
e2fsck 1.40.8 (13-Mar-2008)
Pass 1: Checking inodes, blocks, and sizes
Pass 2: Checking directory structure
Pass 3: Checking directory connectivity
Pass 4: Checking reference counts
Pass 5: Checking group summary information
/dev/md0: 240560/91578368 files (1.3% non-contiguous), 325896911/366283984 blocks
to check the filesystem which was still only 1.5TB on /dev/md0. After that I growed it to its final size of 3TB with
me@server:~$ sudo resize2fs -p /dev/md0
resize2fs 1.40.8 (13-Mar-2008)
Resizing the filesystem on /dev/md0 to 732567968 (4k) blocks.
Begin pass 1 (max = 11178)
Extending the inode table     XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
The filesystem on /dev/md0 is now 732567968 blocks long.
Done. I now have a working three disk RAID5 array with an effective storage capacity of 2.8TB. Thanks to the linux community for such a great piece of software.

Cheers.

Inspiration for the whole stunt I found mainly herehere and here apart from using the big bad G.

No comments: