Container technology and microservices solve many critical challenges, bridging the Dev/Ops divide and helping organizations bring quality applications to production more rapidly than ever before.
As container technology approaches mainstream adoption, however, administrators will expect to see many of the enterprise features available in virtualized environments. Persistent storage is key for container technology as it approaches mainstream adoption. While containers do a great job of encapsulating application logic, they do not offer a viable solution for storing application data across the lifecycle of the container. Ephemeral (or local) storage is not enough, and stateful applications require that container data be available beyond the life of the container that houses them.
Modern web-scale applications are expected to be horizontally scalable, resilient, automatable and platform agnostic. Such cloud native architectures present distinct challenges for persistent storage:
- Specialized storage servers are difficult to scale, due to being a single point of failure.
- Mapping containers to specific hosts (for data) is undesirable, as containers are expected to be portable and lightweight across hosts.
- Storage needs to tightly integrate with containers and orchestration platforms, with APIs to minimize operator overhead.
Traditional storage solutions were designed to present storage to hypervisors or operating system instances, and were not built to work with cloud native architectures and patterns. Enterprises also have to consider legal, data privacy and lock-in issues when considering public cloud providers and storage array vendors.
There are two predominant storage architectures in use today:
- Centralized storage – traditional hardware appliances are tightly coupled and use vendor specific hardware technology for intra-controller communication, configuration and data plane activity such as cache synchronization and front-end director buses. This type of storage is accessed by compute nodes via a network, previously fiber channel fabrics, but today more typically over iSCSI for block and NFS for file interfaces. Centralized storage is characterized by scale up topologies and tends to offer deterministic performance and is a core component in most enterprise on-premises infrastructure deployments.
- Distributed storage – stronger software focus with additional flexibility by providing scale out capability. This category includes Object Stores and Distributed Filesystems. There is a mix of software only solutions (including some open source options) as well as more traditional hardware appliances. Many distributed solutions layer other protocols or gateways on top to achieve compatibility with existing infrastructure (e.g. NFS with Gluster, FUSE layers/file gateways with object stores, block interfaces with Ceph).
Distributed architectures have to compromise designs to make optimizations that either favor performance over cluster consistency or vice versa. This means that, in general, a distributed storage solution will either be performant at the cost of eventual consistency (which leads to data integrity issues for databases) or be strongly consistent at the expense of poor response times due to additional latency.
Additionally, as most distributed systems will spread data across all nodes in a cluster, the latency and reliability of the cluster often degrades as the cluster size increases as any issue that affects a particular node (e.g. disk, server, network) creates a high load to recover and rebuild data and may impact the majority of the dataset. This creates large and complex failure domains and unpredictable performance profiles.
StorageOS Platform Architecture
StorageOS is designed and built as a hybrid between centralized and distributed architectures. Get the StorageOS platform architecture overview to learn more about how it provides a strongly consistent, deterministically performant storage volume with the flexibility of a scale out distributed storage solution – without any of the inflexibility of centralized solutions or the complexity and poor performance of existing distributed environments.
Author: Alex Chircop
Experienced CTO with a focus on infrastructure engineering, architecture and strategy definition. Expert in designing innovative solutions based on a broad and deep understanding of a wide range of technology areas. Previously Global Head of Storage Platform Engineering at Goldman Sachs and Head of Infrastructure Platform Engineering at Nomura International.