Many considerations go into the design of a data storage system. One of the most important features of any storage solution is to ensure data durability, minimizing the likelihood of data loss. We’ll leverage several data durability models to evaluate the scale at which RAID based storage solutions become less reliable than replication based solutions. With this knowledge a storage solution architect can make better informed decisions regarding platform selection.

Redundant Arrays of Independent Disks and their Limitations

For smaller scale solutions RAID has been successfully used to protect against data corruption and drive failures [1, 2]. Drive capacity growth has outpaced disk IO performance gains in the last decade leading to longer RAID rebuild times upon drive failure. Larger capacity drives have also increased the probability of encountering unrecoverable errors due to data corruption during RAID rebuilding. Those two factors have made RAID based storage systems less than ideal for large scale deployments [3]. We’ll use a Markov Chains model to analyze RAID data durability taking into account data corruption and drive failures [2].

Replication based Storage Solutions

One way to ensure data integrity in large distributed systems is through replication. If one copy of the data stored is lost or corrupted, at least one more identical replica of the data remains available for reconstruction and user access. Availability and partition tolerance are favored in replication based systems over consistency. A lot of work has been done modeling data durability in replicated systems but in this post we take the effort one step further but leveraging a detailed Markov Chains model of OpenStack’s object storage solution, Swift.

When RAID is not Enough

Let’s consider what happens to data durability in a RAID based system and a Swift based system as they scale up. Here we assume 3-way RAID 1 and replication in triplicate for the Swift solution. The overhead in these RAID and Swift configurations is identical leaving us with 33% of the total capacity available for usage. The models use Seagate Constellation ES3 3TB drives that have and enterprise grade mean time between failures (MTBF) of 1.4 million hours and an unrecoverable error rate (UER) of 1 in 10^15. Mean time to replacement for RAID drives is assumed at 24 hours, while rebuild time is calculated based on the drive capacity and performance. If we examine Figure 1. below we can see that replication based object storage provides superior data protection for this particular example in the entire range considered. Realistically the level of data durability provided by RAID become insufficient in the 1 PB range, or around the 300 drive mark where the advantage of the Swift solution grows with additional storage capacity. Using lower cost drives will give Swift even more of an advantage. This modelling exercise suggest that in the PB range replication based data protection schemas should be utilized as they provide better data durability than RAID with the same overhead.

Figure 1. Mean time to data loss (MTTDL) for Swift and RAID based systems with different number of drives (capacity).


[1] David A. Patterson, Garth Gibson and Randy H. Katz, A case for redundant arrays of inexpensive disks (RAID), ACM SIGMOD Volume 17 Issue 3, June 1988

[2] Sarah Edge Mann, Michael Anderson, and Marek Rychlik, On the Reliability of RAID Systems: An Argument for More Check Drives, arXiv, Feb 2012

[3] R. Appuswamy, D. C. van Moolenbroek, and A. S. Tanenbaum, Blocklevel raid is dead, in Proc. of the Second USENIX Workshop on Hot topics in Storage and File systems. USENIX Association, 2010

Author: Dimitar Vlassarev

Twitter Facebook Google Plus Linked in