Initially, SSDs were often put into tiered storage as “Tier 0,” indicating both their performance as above the usual Tier 1 of 15K RPM hard drives, and the lack of readiness in the industry for anything faster. Now, storage arrays are engineered to include SSDs and take full advantage of their performance. Tiers 1 and 2 may both be SSD-based, with Tier 3 consisting of hard drives, and if additional capacity is needed, either a fourth high-capacity hard drive tier or a tier of tape. Tiers may even extend to cloud-based storage or off-site tape repositories.
However, the original reason for tiered storage still exists — the cost of the storage itself. While all-flash storage offers high performance, most organizations can’t afford it for all data, so tiered storage will continue to be popular for the foreseeable future.
The question then becomes how much of which kind of storage to allocate to each tier, and how many tiers to set up, which in turn depends on the types of data and the requirements for archiving, fault tolerance and regulatory compliance.
The Basics of Tiered Storage
Tiering is essentially a form of caching, and doesn’t start with storage — CPUs have several layers of cache memory, and there may also be an in-memory cache for storage. In every case, the idea is to provide the required data as fast as possible. For each layer of cache, the higher the layer, the faster and smaller the storage is. For storage, the fastest available is a RAM disk, followed by PCI-based flash, whether on a PCIe card, or the newer M.2 spec flash disks. Next come the standard SAS or SATA SSDs, followed by 15,000 or 10,000 RPM hard drives, then the high-capacity 7,200 RPM near-line hard drives, and then on-site tape, which may be used either for archiving that’s eventually sent off-site, or for collections of data that are too large for disk storage, such as really large data sets used for scientific research or video processing. Finally, there is the cloud, and off-line tape.
How Much to Allocate to Each Tier
Many tiering systems automatically identify the data most often in use and move it to the highest tier possible, ensuring the best possible performance for the data most in demand. It’s also possible to manually designate data to be kept in certain tiers, such as database index files that have a large effect on whole systems. In general, it usually takes 10 to 20 percent of the capacity of the next tier to ensure that almost all data is served from the highest tier. This is fortunate, since it means that the expense of the tiering hardware and software is offset by the ability to do more with less.
With two tiers of SSDs and one of HDs, for example, you might have 500 TB of low-cost hard drive storage, 50 TB of relatively inexpensive SSD as a middle tier, and 5 TB of high-speed PCIe SSD for the fastest tier, and have all the data in the whole array available as if it were all coming from the highest tier.
Other Tiering Considerations
Aside from speed and cost, there are other arguments for multiple tiers. For example, you might want a very high-speed RAID (redundant array of independent disks) 10 in the highest tier, and a more economical RAID 5 or RAID 6 for the lower tier. RAID 10 uses twice the raw capacity to produce a given volume size (4 x 4 TB drives will yield an 8 TB volume), but offers higher throughput and better latency. RAID 5 uses one drive out of three to seven drives, and RAID 6 uses two drives out of three to seven for redundancy, offering a much higher capacity, but lower performance. Thus you might use the same types of drives for Tiers 1 and 2, but with different RAID levels. For the lowest tiers, using a tape library offers the ability to move tapes off-site, increasing the ability to recover from disasters like storms or earthquakes that might destroy the whole data center.
Stretching Tiers Off-Site
Rather than shipping tapes off-site, which can require hours or days to bring back if necessary, archiving to the cloud allows for quick, painless restoration in minutes, while still fulfilling the requirement to have data archived in a location that won’t be caught in the same disaster that might hit the data center.
Tiered storage offers the ability to make a less expensive storage system behave as if it were a more expensive, faster and more reliable system. Since only 10 to 20 percent of data is usually needed again, that smaller fraction is served up from the faster tier, while the rest remains on a more inexpensive medium. As long as faster storage costs more, there will be a place for tiers of storage.
Read our white paper, “Tiered Storage Architecture: Taking Advantage of New Classes of Data Center Storage,” to learn more.