Online capacity expansion of a Dell Hardware RAID 1 with larger disks and a LVM

Online capacity expansion of a Dell Hardware RAID 1 with larger disks and a LVM

This post is about a simple topic, but finding information about it was not so easy due to the specific circumstances it includes. Here i will show you how you can accomplish the task of upgrading a simple 2-disk Hardware RAID 1 on a Dell PowerEdge RAID Controller - or PERC - on a Dell PowerEdge R310 with a PERC H700 Controller with just a super short downtime on the LVM part.

Introduction

We had the challenge to upgrade a simple 2-disk RAID 1 - the second RAID 1 on that particular machine - with larger disks and without adding new ones, because the backplane was fully populated already.

The upgrade took place on a Dell PERC H700 Controller but should work on different PERC's as well and also on other LSI-based controllers.

Our goal was to replace both disks of the RAID with larger ones (from 1TB to 2TB per disk) with the least downtime possible. We also use a LVM on top of the RAID, so we also had to increase the capacity there. Let's go!

Prerequisites

The first and most important step is to check wether your controller is capable of doing an Online Capacity Expansion - or OCE - without adding additional drives.

For all the steps coming you need a working copy of [megacli](https://www.broadcom.com/site-search?q=megacli).

Check if we can do the OCE without adding additional drives:

root@machine:~# megacli -AdpAllInfo -a0 | grep OCE
Support the OCE without adding drives : Yes

If you see a yes here, we can proceed.

Replacing the hard drives

Now that we know, that our controller is capable of online capacity expansion, we can start replacing each currently running disk with its larger pendant. Do this like you replace any other failing hard disk.

As i told in the beginning that we replace 2 disks of the second RAID-1, our slot numbers are 2 and 3 for disk bays 3 and 4 on the R310. Replace them with the correct numbers of your disks you want to replace.

megacli -PDOffline -PhysDrv [32:2] -a0
megacli -PDMarkMissing -PhysDrv [32:2] -a0
megacli -PDPRpRmv -PhysDrv [32:2] -a0

After that, check for the rebuilding process that should have started automatically before continuing with the second drive:

megacli -PDRbld -ShowProg -PhysDrv [32:2] -aAll

When it's done, repeat those steps with the second drive in [32:3].

Enlarge the array

Now that both drives are replaced, we can issue the commands to the PERC Controller to enlarge the array to 100% of the newly available space and with the second command you can see the progress. This should be done fairly quickly.

Make sure that you choose the right logical array with the -Lx command. For us it's the second array and the count starts at 0.

megacli -LdExpansion -p100 -L1 -a0
megacli -LDRecon ShowProg -L1 -a0

Before we continue, make sure that the larger logical drive is also recognized by the operation system before continuing: echo 1 > /sys/block/sdb/device/rescan. Check that you choose the right block device. For our second RAID that's sdb.

Resize the LVM

This part is going to be a bit tricky as you need to pay a lot of attention to the things you do. You can easily destroy your LVM in this step because we need to recreate the array's partition table in order for the LVM to recognize the larger disk size.

At first, stop all processes that somehow uses the logical volumes that reside on this array.

We used our second RAID for the [Elasticsearch](https://www.elastic.co/products/elasticsearch) database. Therefore we need to stop ES first.

service elasticsearch stop

Now we can unmount the logical volume and then disable it temporarily, so it cannot be used by the system.

umount /dev/vg1/elasticsearch
lvchange -a n /dev/vg1/elasticsearch

Check that the LV is really inactive using lvscan.

Here comes the tricky part. We will delete and recreate the partition table for that array in order to enlarge the partition where the LVM resides on.

Open the array's partition table using fdisk /dev/sdb.
Print out the current partition table by pressing p and note all values there or make a screenshot.

Now press d and 1 to delete the first partition. In our case there is only one. Recreate it now by pressing n, p and 1 to create a new primary partition. For the sector start value use 1 or the number you noted from the step above and for the sector end value, just use the value provided by default. This will use all available amount of disk space.

In a last step, reapply the correct partition type Linux LVM (8e) by pressing or typing t, 1 and 8e.

Recheck that there is now a new partition table similar to the old one where you noted the values before by pressing p. If that's the case, save everything to disk by pressing w and Enter.

Now you can tell linux to reread the new partition table by running partprobe. This makes sure that the Linux kernel is aware of the new size of the LVM physical disk.

You can then just resize the LVM physical disk, enable the logical volume again and also resize that to the maximum amount of available space with:

pvresize /dev/sdb1
lvchange -a y /dev/vg1/elasticsearch
lvresize -l +100%FREE /dev/vg1/elasticsearch

Remount the logical volume and resize the filesystem

As a last step we can remount the logical volume easily by issuing a mount -a which will read the mount informations from /etc/fstab and mount everything again that is not mounted currently.

Now simply do an online resize of the filesystem and start the services, in our case elasticsearch, again:

resize2fs /dev/vg1/elasticsearch
service elasticsearch start

Conclusion

I had hard times finding the right commands for the online capacity expansion of the RAID controller. Everything else was more of less the same that you need for similar RAID operations. I hope this helps people who want to enlarge their existing arrays without wasting hours of research about the right commands and informations on how to proceed.