Redundant array of independent disks (RAID) just had its 30th anniversary, and it’s showing its age. The technology is being reworked to improve rebuild times after a drive fails and is replaced, and to take into account the higher reliability and performance of SSDs. Rather than replacing RAID storage with SSDs, vendors are adding new types of RAID and other types of redundancy.
Currently, the algorithm uses data striped across at least three disks, so that even if one of the disks in the array fails, all the data can be reconstructed. Performance is also improved since data is read and written to multiple disks simultaneously, it is generally increased by about the same factor as the number of drives in the array.
Typical types of RAID include RAID 0, RAID 1, RAID 0+1 (also known as RAID 10), RAID 3, RAID 5, RAID 6, RAID 50, RAID 60 and beyond. Each type of RAID has a different scheme for distributing data across multiple drives, with varying levels of efficiency and performance.
A typical RAID 5 array uses five identical devices, which can be hard drives, SSDs or other types of storage — even tapes. Enterprise SSDs, such as Samsung Enterprise SSDs, provide high performance and better reliability than HDDs. Data is distributed across four of the disks, while the fifth disk is a parity drive, storing checksums. With this type of RAID array, since one disk is used for parity, 80 percent of the total capacity is available for storage, so efficiency is 80 percent.
RAID 0 only includes data drives, so efficiency is 100 percent, but there’s no protection if a drive fails. RAID 1 uses pairs of drives, each replicating the other, which means that efficiency is 50 percent – half of all the capacity is used for redundancy. Some RAID schemes use multiple parity drives so that two or more drives can fail without losing data.
Managing Data on Multiple Levels
All the different levels of RAID are attempts to balance performance and protection. In addition to the day-to-day performance, there is another issue — with drive sizes having moved from hundreds of megabytes to multiple terabytes since RAID was first developed. RAID 5, 1 and 0+1 can take a very long time to rebuild after a drive fails and is replaced. If more drives fail before the first failed drive is replaced, the data can be permanently lost.
Given that SSDs are generally much more reliable than hard drives, especially hard drives from the days that RAID was first developed, some people advocate doing away with RAID and connecting a bunch of disks in a JBOD (derived from “just a bunch of disks”) which doesn’t include the parity function of RAID.
This improves performance over a single drive, but without the ability to recover data in the event of a disk failure. From a performance standpoint, a single SSD such as the Samsung PM863a can outperform an array of HDDs, with better transfer speeds, latency and input/output operations per second than multiple HDDs.
Updating Schemes for Data Processing
Many experienced admins still want RAID for redundancy, even if SSDs are more reliable than HDDs. In addition, new schemes for redundancy — including erasure coding, RAID 50 and 60 and replication — are needed as data moves into the cloud, where hardware access at the level needed for RAID may not be available.
The new schemes include both virtual RAID, where files are divided into multiple parts, with each part stored on a separate device, and erasure coding. Erasure coding divides data into data and parity, for example, with 10 data drives and six parity drives, up to six disks can be lost before the data is gone.
With erasure coding, data and parity drives can be located in different places, rather than the single box that RAID generally requires. Further, erasure coding can reconstruct data more quickly than RAID, getting a data store back to normal more quickly than RAID.
Another scheme is replication, which creates multiple copies of data, often in different data centers. A typical scheme might replicate each file separately, in at least three of five data centers. This is a complex scenario, because data must be synchronized between the different locations, but ensures that even if an entire data center is lost, all the data can be reconstructed and will be continuously available.
Defining the Right Redundancy Scheme
Picking a redundancy scheme may not be up to the administrator. Enterprise storage systems from vendors use many different types of physical and virtual RAID, erasure coding, replication, cloning and other schemes to improve performance and redundancy.
The administrator should understand how the various schemes work and interact, and what the trade-offs are for each method — performance versus levels of safety, recovery times and availability with degraded systems, both at the individual system level, data center level and geographic level.
While RAID is aging, it still has its place, and given the critical nature of data to the enterprise, few administrators will be willing to do without redundancy altogether. The newer forms of RAID, as well as alternatives such as erasure coding and replication, provide data protection while ameliorating the downsides of the original forms.
Find the best storage solutions for your business by checking out our award-winning selection of SSDs for the enterprise.