Software-defined storage is not a new concept, at least in the context of the computer science world. The term was popularized by VMware in 2013 and recognized by the Storage Networking Industry Association in 2014, but the idea behind software-defined storage is older—it was just called storage virtualization. Software-defined storage is simply a way to add a layer of abstraction between the application and the data storage hardware. It’s used in both an on-premises and public cloud context to overcome storage-related limitations and increase operational control. However, as enterprises migrate to a cloud-native environment, software-defined storage has to evolve to meet the needs of modern applications.
Software-defined storage is particularly important for running cloud-native, containerized applications, but it isn’t exclusively a cloud-native phenomenon. Using software layers over data center hardware provides many of the same benefits—no lock-in, better flexibility, higher performance—as cloud-based software storage layers.
According to the Storage Networking Industry Association, there are four key components to software-defined storage: automation, API-based interfaces, virtualized data paths and scalability. Here’s a look at why each part of the software-defined storage picture is important in modern application development, as well as a couple other things, like avoiding vendor lock-in.
Let’s start by unpacking the term ‘software-defined storage.’ This term is more descriptive than it seems—software is being used to define the application’s storage requirements, in contrast to a storage administrator defining the application’s storage resources.
Software-defined storage gives application developers the ability to use an interface to describe their application’s storage requirements (more on that below). The software then automatically procures the resources and maps services to their storage resources.
Especially in a dynamic, containerized, cloud environment, automation becomes even more essential and is really the key to software-defined storage. The ability to dynamically map containers to storage, to connect and disconnect from storage resources and manage error handling and fail-overs automatically, without application downtime is not achievable without automation. Using software to handle the data storage provisioning as well as management makes it possible to run stateful applications using containers and to do so without error-prone, time-consuming manual intervention.
Software-defined storage uses APIs to provision, manage and maintain storage resources. This allows developers to provision storage through an interface rather than by provisioning storage directly, either on data center hardware or in cloud provider storage systems. The combination of APIs and automation also make integration with container orchestration tools like Kubernetes possible, so workloads can be deployed and managed in concert with storage.
This makes it easier for developers to control storage provisioning. They don’t need to go through a storage administrator at all, but can provision an application’s storage themselves, immediately. At the same time, using API interfaces cuts down on the manual steps in storage provisioning, and thus on the likelihood of errors. The level of storage-related expertise needed to provision storage is also lower, freeing up storage specialists for other tasks
Virtualized Data Paths
Data virtualization is also a core piece of any software-defined storage. There’s a translation layer added between your storage resources and your application and all of your storage resources are combined into one storage pool. Pooling storage resources also makes it much easier for developers to connect applications to the type of storage best suited for the particular use-case rather than whatever generic storage is available.
Data virtualization also increases utilization and density for each instance. Without an abstraction layer, running a microservices-based application would be cost-prohibitive—you would have to provision far more resources that you actually need. Using software-defined storage makes it possible to connect hundreds of pods to a virtual machine and make better use of your storage resources.
Adding a layer of virtualization or abstraction between the application and the storage removes many of the constraints on physical storage use and increases application flexibility.
Software-defined storage offers the ability to automatically, dynamically scale storage infrastructure up or down as needed, just as container orchestration tools dynamically scale compute resources. While developing in a cloud-native environment already makes storage provisioning dramatically easier than in a datacenter, using software-defined storage to manage Amazon, Google or Azure’s native storage resources automates scalability entirely.
The compute part of modern applications is very portable and easy to move between public clouds or to a private cloud in most cases, especially if the application uses containers. But since applications also require data, your application can only be as portable as your data—and if you’re not using a software-defined storage abstraction layer, moving your data between public clouds or to a private cloud could be costly. Data is usually what tethers companies to a particular cloud environment (or to a particular hardware manufacturer).
Software-defined storage gives companies the ability to move data as easily as the rest of the application, by creating a platform-agnostic translation layer between the storage and the application. This makes pursuing a multi cloud strategy feasible—and also gives enterprise-level companies leverage in pricing and feature negotiations with cloud providers.
Data is not generally mobile, but in the context of building containerized, stateful applications it has to learn to move. Storing data directly on servers or on AWS, Azure or Google storage services makes it impossible for your data to follow containers as they move around clusters. Without the ability to make your data as nimble as the applications that need to access it, running any kind of stateful, containerized application is risky. And the risks are serious—data accessibility are one of the main reasons applications fail.
Unlike compute, storage, and the servers used to provide it, degrade with time. If an application stores all of its data on individual servers, you have to make sure that none of those servers fail. You have to treat each server like a cherished pet. But servers are not cats—they are cattle. You don’t want to be in a position where you’re trying to nurse a server well past it’s ‘natural’ lifespan because moving the data to another location is time-consuming.
You also don’t want your data to be put at risk by a single point of failure like a server malfunction. Using a layer of abstraction means if a server fails your data is safe and can move seamlessly, without the need for human intervention, to another resource. It can also allow for self-healing, monitoring for both errors and performance problems caused by storage issues.
Regardless of whether your application is deployed in a public or private cloud and whether your data is stored on-site or in the cloud, many of the challenges remain the same. Because even if your company does not need to buy an appliance to handle storage, your data is stored on a physical machine somewhere—yours, or Amazon’s or Google’s. The challenges inherent in connecting applications—especially dynamic, containerized, microservices-based applications—to data storage are similar.
Software-defined storage solves many of the data storage-related pain points, making it possible to densely pack volumes into a single instance, to reduce human error in storage management and to increase your data resiliency. Building a modern, containerized, microservices-based application without a software layer between the application and the storage resources simply isn’t a best practice. The limits on the number of volumes per instance would make an application unnecessarily expensive to run, for one thing. There are too many manual steps, too many places where things could go wrong and too little visibility and monitoring into storage-related issues to make feasible to run customer-facing applications without a software storage layer.
By allowing developers to define their applications’ storage needs and procure the store needs themselves, software-defined storage also speeds up the development process. At the same time, the ability to use procure the best type of storage for the application’s needs helps improve performance.
In some organizations, storage is treated like an afterthought—something that perhaps stems from the common practice of building an application, then contacting a storage admin to figure out the storage options. Software-defined storage makes it possible to break down that silo, addressing storage issues as a part of the development process rather than something to handle later. Storage is essential to most applications and treating it with the importance it deserves increases application quality and resilience.
Author: Alex Chircop
Experienced CTO with a focus on infrastructure engineering, architecture and strategy definition. Expert in designing innovative solutions based on a broad and deep understanding of a wide range of technology areas. Previously Global Head of Storage Platform Engineering at Goldman Sachs and Head of Infrastructure Platform Engineering at Nomura International.