First of all, here is why RAID 5 is so bad and has stopped working since 2009.
To recap, two things to blame here.
- UREs (Unrecoverable Read Error)
- Disk storage capacity
Here is an example to illustrate why it’s so bad having RAID 5 on large capacity hard drives.
With a 7-2TB drives RAID 5 setup, when one drive failed, you will have 6 2-TB drives remaining. After you put in a new 2-TB drive, the resilver process kicks off to rebuild the array. Because the RAID controller needs to read through all remaining 6 disks, total of 12-TB of data, to reconstruct the data from the failed drive, there is a very high possibility that it will see another URE during the process. When that happens, it’s the second drive failure in the array, simply meaning Game Over.
I actually had that exact nightmare before on one of the backup servers. I ended up having to rebuild everything from scratch.
However, the theory applies to the traditional Winchester hard drives (spindle drives) that have a pretty high URE no matter how reliable it claims. What about the SSDs that slowly take over the whole world?
Surprisingly, it seems to be absolutely fine utilizing SSD drives on RAID 5 array. Here is a nice run down by Scott Alan Miller:
- SSDs generally just don’t have UREs so the second disk failure due to URE during the resilver process is non-existed.
- Time to reconstruct the data from the failed drive is hugely reduced.
- Resilver impact is much reduced as SSDs handles non-sequential data access so much better.
The only concern remaining here is the lifespan on SSD drives. For example, if you utilize 6-1TB SSDs from the same brand on a RAID 5 array, there are chances down the road when two of them died at the same time because they share the same endurance lifespan. Maybe, you could pick SSDs from a different brand with the same size or intentionally perform a disk failure to the array, e.g. sequentially hot-swap one with a spare drive in the array to differentiate their lifespans.
Now, I am marching on my way to build a new SSD RAID5 array.
A few more good readings regarding the disk array:
- http://www.smbitjournal.com/2012/12/the-history-of-array-splitting
- http://www.smbitjournal.com/2012/11/one-big-raid-10-a-new-standard-in-server-storage
- http://www.smbitjournal.com/2012/11/choosing-raid-for-hard-drives-in-2013
- http://www.smbitjournal.com/2012/11/choosing-a-raid-level-by-drive-count
- http://www.smbitjournal.com/2012/11/hardware-and-software-raid
- http://www.smbitjournal.com/2012/08/nearly-as-good-is-not-better
- http://www.smbitjournal.com/2012/07/hot-spare-or-a-hot-mess
- http://www.smbitjournal.com/2012/05/when-no-redundancy-is-more-reliable
- http://www.smbitjournal.com/2011/09/spotlight-on-smb-storage
- http://www.zdnet.com/blog/storage/why-raid-6-stops-working-in-2019/805
- http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162
- http://queue.acm.org/detail.cfm?id=1670144
Just stumbled on this article! Thanks so much for the references 🙂