Traditionally, the approach to designing business storage systems has been that of a device that adheres to the business network or lives in the SAN. It used to correspond to the computer approach, where applications live on dedicated servers (virtual or physical).
Cloud computing changed the way we think about computing infrastructure. It doesn’t matter how a particular server is built. The way the calculation elements are connected and their added power to create a coherent, flexible and scalable calculation system is of much greater importance. For this reason, we have seen much more intellectual energy put into the design of orchestration infrastructures (such as Kubernetes). In addition, large iron blocks are less common in cloud environments, where finer-grained elements offer greater flexibility and scalability control.
When designing a cloud storage system, a similar approach is applied. Instead of isolated, coarse-grained devices, it is desirable to use cloud-dispersed native cloud resources and combine their power and functionality to create a native-level storage system. To get the most out of a cloud environment, cloud storage must be designed around a different set of principles than those used in application-based systems.
New paradigm, new challenges
Storage puts on the table a set of challenges that computing and network infrastructure must not face.
For decades, traditional storage systems have been designed using dedicated components such as battery-backed write memories, multi-path compatible storage media, hardware-assisted RAID, and dedicated back-end plans, or switching infrastructure. . These components, while essential to providing enterprise-class performance and reliability, are rarely available and cannot be easily integrated into clouds.
Giving up performance and reliability, on the other hand, is not an acceptable alternative. A true enterprise-class cloud storage system must give up these dedicated hardware components without sacrificing reliability and performance.
Think outside the box
While struggling with the above challenges, the design of the cloud storage system can benefit from some of the properties of cloud environments. When data is processed for storage, it is no longer limited to resources confined to a box. The cloud allows us to use its elastic memory and computing resources to the extent necessary. Unused resources are not wasted; can be used for other tasks (no storage).
It is also possible to consume more resources per transaction than is normally consumed on a device, as long as this has some benefit, such as better performance, better data management, or better economies of scale.
In home appliance-based systems, cost and performance optimization is achieved by manually adapting data paths and caches to get the best possible performance from the available (fixed) resources available in the table. Cloud systems, on the other hand, require a different approach. Resources are not fixed; rather, they are elastic and can be enlarged or reduced depending on demand.
Performance requires a new method
Although the latency of a given transaction is limited by network and media physics, IOPs (input / output operations per second) and bandwidth can scale almost infinitely. Care must be taken not to limit this scalability; with this in mind, the desired approach for cost / performance optimization is to reduce the resources needed to achieve a certain amount of work.
At the cloud level, data caches are highly inefficient and have limited benefit, if any. Instead, any available cache resources should be used to cache the metadata. Here, the distributed nature of the cloud requires a form of distributed and scalable metadata cache.
Leveraging cloud resources allows real cloud storage systems to do more work on data ingestion (when new data is added to the system). This reduces the total amount of long-term work, as it saves the need to scan the data later to perform tasks such as deduplication. The availability of multiple cores in cloud environments allows you to perform some of the tasks of ingestion in parallel, reducing latency.
The horizontal scale is natural in cloudy environments. Cloud storage can be perfectly scalable when designed based on lock-free data structures and, as far as possible, without synchronization that allow for parallel and highly simultaneous mass operation.
A new approach to storage management and data management
Cloud computing automates many of the management tasks that control IT resources. Similarly, real cloud storage should automate storage resource management.
Ideally, an administrator should be able to specify the SLAs and Qos required for an application, the desired submission form for data and security settings (ACL, encryption, etc.).
Based on this, the system should be able to automatically manage resources to meet the requirements.
The way data is handled also needs to be changed. Cloud storage systems manage large groups of data. At the low level, these clusters should be represented in such a way that the data does not depend on the interface access methods used by the applications. With this in mind, the traditional separation between primary and secondary storage becomes redundant.
When pool data is available for any application, in any submission form, the system automatically determines the physical location (or storage level) of the data according to the QoS configuration. The system automatically manages redundancy, disaster recovery and data mobility; primary and secondary storage are simply converted to different levels within the same system.
This eliminates the need to create unnecessary copies of data sets, as is often the case with copy data management systems. No need to duplicate properties and requirements; data storage and management can now live in the same framework.
Access data differently
Cloud IT has a wide variety of applications, many of which may require different data access methods (block / file / object, etc.). Classifying a system based on the access method (e.g., NAS) creates artificial and unnecessary boundaries between data sets that exceed two of the cloud’s main goals: economies of scale and flexibility.
To avoid this, a true cloud system should be able to present data in various forms. This can be achieved using intelligent data structures, which abstract the data and present it independently of the access method.
In order to take advantage of the many advantages of cloud environments on the one hand, and to overcome the challenges posed by the cloud, on the other hand, cloud storage systems must be separated from the principles and guidelines used for the ‘traditional storage. A well-designed native cloud storage system should be able to provide a set of features and performance for the entire enterprise without sacrificing the benefits of the cloud.
About the author: Nir Peleg is the CTO and co-founder of Ionir, which develops a Kubernetes-based storage system. Nir is responsible for the company’s strategic technology roadmap and intellectual property management. Prior to Ionir, Nir founded Reduxio and led the transition from its Reduxio home appliance-based product technology to native cloud storage defined by Ionir software. With over 30 years of industry experience, Nir was CTO and co-founder of Montilio, an innovative file server acceleration company, and founder of EVP R&D and CTO at Exanet, which built one of the first NAS systems. distributed around the world. Nir was the first employee and chief architect of Digital Appliance, Larry Ellison’s massively parallel IT company that eventually became Pillar Data Systems (acquired by Oracle). Nir holds more than 20 U.S. patents and pending patents in the areas of computing, distributed storage, data deduplication, and encryption.
Blurred storage lines: Clouds that appear as presses
Storage Status: Trends in the Cloud, IoT, and Data Center
Big Data is still difficult. Here’s why