User Tools

Site Tools


storage:parity-raid

Why is Parity RAID (RAID 3,4,5,6) a bad idea?

Rebuilds expose Unrecoverable Read Errors

Parity RAID rebuilds necessitate reading every single disk block in the array, making URE's increasingly likely to find as disk sizes grow.
http://www.zdnet.com/blog/storage/why-raid-5-stops-working-in-2009/162

Reads fail

SATA drives are commonly specified with an unrecoverable read error rate (URE) of 10^14. Which means that once every 100,000,000,000,000 bits, the disk will very politely tell you that, so sorry, but I really, truly can’t read that sector back to you.

One hundred trillion bits is about 12 terabytes. Sound like a lot? Not in 2009.

Disk capacities double

Disk drive capacities double every 18-24 months. We have 1 TB drives now, and in 2009 we’ll have 2 TB drives.

With a 7 drive RAID 5 disk failure, you’ll have 6 remaining 2 TB drives. As the RAID controller is busily reading through those 6 disks to reconstruct the data from the failed drive, it is almost certain it will see an URE.

So the read fails. And when that happens, you are one unhappy camper. The message “we can’t read this RAID volume” travels up the chain of command until an error message is presented on the screen. 12 TB of your carefully protected - you thought! - data is gone. Oh, you didn’t back it up to tape? Bummer!

Unrecoverable Read Errors Lead to a Faulted Array

A comment on the follow-up article, Sorry about your broken RAID 5, explains why most controllers will fail an array if an URE is encountered during a rebuild (and that some advanced controllers won't but will mark the stripe as bad.

http://www.zdnet.com/tb/1-36542-673103

From the comment:

Very smart RAID algorithms know how to “mark” a stripe as bad (and even mark which sectors in the stripe are bad) when an URE is encountered during rebuild. […] So, if you have a very smart RAID-5 controller, you can get a URE during rebuild and not fail the entire volume. Only the URE sector and the corresponding rebuilt sector on the replacement drive will be corrupted. If the application has access to the correct data (e.g., from a backup), then the good data can be written to the logical volume, and the RAID algorithm may be smart enough to rebuild the stripe using the good data and restore coherency across the stripe. But this only happens if the application writes good data to the bad stripe, and that's likely to involve an actual person understanding a cryptic message from the OS, finding the right backup, and restoring the right information from the backup.”

How Will Your Controller React?

The question anyone running parity RAID needs to know is how a their array controller will react to this issue, and how the O/S reacts to getting notified of an error from the array controller. We're not talking about a read error from the O/S's POV. We're talking about data that wasn't requested being marked bad. How does the O/S interact with the controller to deal with that issue, reconciling files (or directories) which are affected without having requested access to them first? And how will the system administrator react to that?

Here's some additional information about RAID 5 and it's problems: http://www.miracleas.com/BAARF/BAARF2.html

Silent Bit Rot

Even Worse: Silent “bit rot” due to Partial Media Failures

From Art Kagel's article on BAARF, RAID5 versus RAID10 (or even RAID3 or RAID4)

The problem is that despite the improved reliability of modern drives and the improved error correction codes on most drives, and even despite the additional 8 bytes of error correction that EMC puts on every Clariion drive disk block (if you are lucky enough to use EMC systems), it is more than a little possible that a drive will become flaky and begin to return garbage. This is known as partial media failure. Now SCSI controllers reserve several hundred disk blocks to be remapped to replace fading sectors with unused ones, but if the drive is going these will not last very long and will run out and SCSI does NOT report correctable errors back to the OS! Therefore you will not know the drive is becoming unstable until it is too late and there are no more replacement sectors and the drive begins to return garbage. [Note that the recently popular IDE/ATA drives do not (TMK) include bad sector remapping in their hardware so garbage is returned that much sooner.] When a drive returns garbage, since RAID5 does not EVER check parity on read (RAID3 & RAID4 do BTW and both perform better for databases than RAID5 to boot) when you write the garbage sector back garbage parity will be calculated and your RAID5 integrity is lost! Similarly if a drive fails and one of the remaining drives is flaky the replacement will be rebuilt with garbage also propagating the problem to two blocks instead of just one.

storage/parity-raid.txt · Last modified: 2014/09/29 00:31 (external edit)