This article is part of the Technology Insight series, made possible by funding from Intel.
As global data volumes continue to increase, the resulting increases in storage traffic create bottlenecks in infrastructure. Large-scale blockchain storage providers (organizations from large enterprise data centers to cloud service providers (CSPs) and content delivery networks (CDNs)) need to find solutions to manage this flood of data.
Conventional approaches to the problem use NAND solid state drives (SSDs) to store data and help keep network pipes only within their bandwidth capabilities. However, with modern changes in network bandwidth and PCI Express, NAND cannot keep pace. This means that applications and services struggle to meet end-user expectations and the organization’s ROI goals.
What is needed is a new approach and technology that does not depend on oversupply, a popular but expensive way to increase performance and storage resilience in modern “disaggregated” environments.
Here’s a brief look at traditional approaches and what companies and vendors need to do to position themselves for tomorrow’s cloud storage demands.
- As data volumes continue to explode, data centers need to increase the bandwidth of conduits such as Ethernet tissues and PCI Express (PCIe) channels. SSD buffers cannot cope with this increased storage load that affects network performance.
- Due to the higher performance and endurance features, Intel Optane SSDs can offer much greater efficiency and value in this buffer role than conventional NAND SSD approaches.
- Cloud service providers, content delivery networks, and companies that manage bulk-scale blog storage will benefit more from Optane-based buffering.
Data center bandwidth reduces storage performance
Until a few years ago, it wasn’t a problem if cloud vendors placed storage next to computing on their servers. CPU and memory performance were very fast and 1 GbE and 10 GbE network links were sufficient for the modest amount of data flowing to the systems. NAND SSD data could write and read this data fast enough to keep up with bottleneck PCI Express workload demands.
Today, NAND SSDs have grown progressively. But these improvements palpate alongside doubling the bandwidth per lane of PCI Express 3.0 to 4.0. An x16 connection now has a one-way bandwidth of 32 GB / s. At the same time, data center network pipes have been expanded to 25 GbE, 100 GbE, 200 GbE, and even 400 GbE (although this high speed remains uncommon).
These bandwidth advances are much needed as data volumes continue to increase. According to Statista, the annual size of real-time data worldwide did not reach 1 zettabyte (ZB) until 2014; between 2021 and 2022 it will grow 5ZB. In 2025, the total volume in the data sphere will exceed 50ZB. This balloon will be reflected in most major data centers, as vendors try to offer increasingly real-time analytics, transaction processing, and other high-performance I / O services.
In short, CSPs and CDNs have too much real-time data to keep them next to CPUs, although this would provide the best I / O performance. Data must be distributed across multiple systems. This reality popularized the idea of disaggregating computational storage, effectively creating large “data lakes”.
The approach also allows IT companies and service providers to scale storage without increasing computation and memory, allowing for more cost-effective capacity expansion. The faster the network pipes, the more unpleasant will be the viable high-performance disaggregation. Otherwise, I / O demand caused by higher data volumes and larger real-time workloads will create a bottleneck in the network plot.
“With the [PCIe] The Generation 4 interface and these faster networks, the amount of data you can store is so large that you need dozens of SSDs to absorb the data coming from the pipe, ”explains Jacek Wysoczynski, Intel’s chief product planning manager. “Within this, you want the high – performance SSDs to serve as buffer and the data lake to disappear. Suppose each storage box has 24 slots. If you only need two of these to be buffer units, that’s one thing, but when you need 12 to store them, that’s different. Now you risk overflowing the storage box all the time. If this happens, the data cannot be written to storage, which will stop network traffic, which temporarily stops the data center. This is a time of “falling sky.” “
The situation described by Wysoczynski involves an excessive supply of SSDs, which is usually done to improve storage performance and / or strength. Imagine having an 800 GB SSD, but only making 400 GB visible to the host. Invisible space can be allocated to activities such as additional garbage collection, which will help improve writing performance. It can also help keep use below the 50% capacity threshold, above which unit speeds can begin to slow down. An Intel White Paper details how over-supplying SSDs can also significantly improve drive resistance. The downside, of course, is the cost of potentially massive amounts of unused capacity. Without a better alternative, oversupply was the high-performance (if expensive) option for storage buffer. Fortunately, that is changing.
The Optane alternative
Since their arrival in 2017, Intel Optane SSDs have provided a better-performing alternative even to enterprise-class NAND SSDs in both write metrics (especially random workloads) and endurance. In write-intensive real-time storage application configurations, such as CSP or any sizeable data center that implements scalable elastic block storage, Optane SSDs excel at buffer functions. However, increasing bandwidth in data center networks, along with increasing PCIe bandwidth, have changed the dynamics of how storage should be deployed.
Think of the following figures from Intel’s white paper “Trends and Implications of Distributed Storage for Cloud Storage Planners”. Keep in mind the emphasis on achieving 90% network bandwidth, which is what data center administrators often consider the “sweet spot” for maximizing bandwidth value.
Image credit: Intel
Given the predominant technologies of the time, 90% saturation could be achieved on a 25 GbE connection using only two Optane P4800X drives in PCIe Gen 3. A high-performance SSD like the P4610 could not supply as many I / O like the P4800X, but the two weren’t miles apart.
With 100 GbE and PCIe Gen 4, the situation changes significantly. Note that the new 400GB P5800X offers impressive performance jumps over the 3.75GB P4800X in several key metrics, including 100% sequential write bandwidth (6200 vs. 2200 MB / s, respectively) , Random write IOPS (1,500,000 vs. 550,000), and latency (5 vs. 10 µs), which contribute to much more storage traffic. So despite quadrupling the network bandwidth, it only takes three second-generation Optane P5800X SSDs on a Gen 4 PCIe bus to almost fill that Ethernet link. Instead, up to 13 current-generation NAND SSDs are needed to supply the same network I / O, depending on the workload. Not surprisingly, the figures have doubled by about 200 GbE.
The most important point: Optane can use low-capacity disks and still achieve high performance and resistance on disks four or eight times larger: 1.6 TB / 3.2 TB NAND SSDs.
Another factor to consider is the consistency of performance. Mixed read / write loads of high volume can be especially strenuous for SSDs. Numerous studies by Intel have examined how Optane media maintain a very low and consistent I / O latency over time when stressed under complex and heavy workloads. In contrast, the responsiveness of NAND SSDs tends to deteriorate over time with similar conditions, making it difficult to maintain the quality of service of the application.
Intel’s “Distributed Storage Trends” document discusses a common data center scenario with dense storage racks, 100 Gb / s Ethernet, and 90% I / O saturation. The bottom line is that three Optane P5800X SSDs can do the work of 13 TLC NAND SSD buffers, leaving room for many more mass storage units per cabinet. Intel claims that this leads to a “12.6% improvement in the cost per GB of raw storage,” including Capex and Opex savings over three years of power usage.
This strategy of using Optane SSDs to provide sufficient buffer performance for current and future data volumes flowing through expanded I / O conduits will interest CSPs, CDNs, and infrastructure service providers (IaaS) they offer. storage. That said, the performance and cost advantages of Optane SSDs in this scenario could also be applied to computing server clusters with considerable local connected storage, as long as the cluster manages multiple large data sources at once to process them in real time. The Optane SSD can offer greater cost efficiency, higher overall performance and fewer sources of administrator frustration.
“It’s very common for people to try to optimize workloads and be kind to their SSDs,” says Andrew Ruffin, Intel’s strategy and business development manager. “You can only try to transmit sequentially, make deep queues, etc. But when you have multiple nodes hitting the same data stream, it just turns into random traffic with zero sequentiality. No matter the over-supply or whatever, when you have these multi-tenant environments, it will be difficult for storage devices. That’s why it’s essential to understand the need to optimize your device for traffic. ”