About the Author

Douglas EadlineDouglas Eadline PhD, is both a practitioner and a chronicler of the Linux Cluster HPC revolution. He has worked with parallel computers since 1988 and is a co-author of the original Beowulf How To document.  Prior to starting and editing the popular http://clustermonkey.net web site in 2005, he served as Editor-in-chief for ClusterWorld Magazine. He is currently Senior HPC Editor for Linux Magazine and a consultant to the HPC industry. Doug holds a Ph.D. in Analytical Chemistry from Lehigh University and has been building, deploying, and using Linux HPC clusters since 1995.

User Rating: / 3
PoorBest 
A blog about making HPC things (kind of) work

RAID 5 protects against a single disk failure. Remember that.

Those wonderfully cheap and big SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14 bits. This means that once every 10^14 bits the drive will be unable to read a sector. This number used be considered pretty big. Turns out, it has recently become smaller in relation to drive size.

Consider a 7 drive RAID 5 that uses 2TB drives. That is a lot of cheap storage (12TB or 0.96^14 bits). When one drive fails you need to read though the remaining 6 drives to reconstruct the data. Chances are a drive will have a URE in this process. Read the first line again. Your RAID5 array is borked. Time go get the backup tape. You have tapes right?

The problem is due to the fact that the URE rates have remained constant as the densities have grown. There are two solutions (for now). First, look for drives that have URE rates higher than 10^14. These may cost more, but it may save your data someday. And second, use RAID6.

A RAID 6 array protects against a double drive failure.

As you start looking at the those 3TB SATA consumer drives, read the previous paragraph. As unlikely as it may sound, two drives failing during a rebuild could turn out to be statistical certainty as drive densities increase.

Personally, I use RAID1 or RAID10. I can be very happy with 4 TBytes RAID10 of storage (using four low cost 2Tb SATA drives). I am also conforted to know there is a full copy of the drive in case things go south. The likelihood of the same two sectors going bad on two drives is very very small and with Linux software RAID I can mount each drive independently (as a single drive, not as RAID) if needed. Finally, remember RAID is disk based and not file based. RAID will rebuild a 2 TB drive even if there is only one small failed file on the drive.