Persistent Storage for Containers: Stateful Apps in Docker

Rethinking Storage in the New World of Containers: The StorageOS Approach

Containers have obviously changed the way we deploy apps to our infrastructure. We’ve gone from virtualization technologies optimizing hardware to containers optimizing not just the hardware, but how organizations develop, deploy and manage the configuration of those applications.

But there’s a problem. Containers, and Docker specifically, use a function of layers to define how images run. An application can be made up of a number of different layers to reduce the size of the overall images. On top of those images, there is a single copy on write overlay. This allows a temporary working space for a container to run. Obviously, the downsides to this is that once an app is terminated, the data goes.

Ubuntu

$ docker run –d --name mycontainer myapp:v2

$ docker stop mycontainer

$ docker rm mycontainer

Fundamentally, containers were originally designed to be stateless. They don’t have data persistence, and they can’t maintain data when they’re either moved to another node or the container is destroyed.

Storage is Critical

There’s no such thing as a stateless architecture. State in your application is stored somewhere. It could be in databases, it could be in object storage, etc.

As enterprises mature their container or orchestration environments, in most cases they do the easy stuff first by moving stateless apps into this environment. But then comes the question on how they work with storage systems. And storage is critical.

Application data, including databases, message queues or instrumentation, need dedicated persistent storage. And it’s not just any persistent storage, they need storage with guaranteed performance. For example:

  • Application binaries need ephemeral, performance storage.
  • Application data (e.g. databases, message queues, instrumentation) needs dedicated persistent storage supporting performance, for example block storage for databases, replication for high availability, snapshots for point-in-time copies and encryption to secure data in clouds.
  • Configuration needs to be shared and persistent, typically filesystem.
  • Backups may need compression, deduplication, and cloud destinations.

Ultimately, most applications need some sort of storage for storing data to volumes that sit on a file system or block device within an environment.

Containers Need Cloud Native Storage

To maximize the benefits of containers, cloud native storage is needed. Cloud native is defined by the container, Docker environment and orchestration. It is:

  • Horizontally scalable
  • No single point of failure
  • Resilient and survivable
  • Minimal operator overhead
  • Decoupled from the underlying platform

There are eight core principles of cloud native storage. You can read those in more detail here.

Benefits of Cloud Native Storage

For those organizations adopting containers, cloud native storage offers significant benefits. One of the key benefits is storage mobility. It gives:

  • Orchestration – Once you have a storage system that can be orchestrated with an API, you have end to end movement of not just your application, but also all the other dependencies of your application like your storage.
  • Persistent Data – Your container uses a volume, can move around different nodes in the cluster, or indeed can move around perhaps different platforms altogether, and continue to access the same volume just like the data was local.
  • Hotspots – Once you get to the point where you have schedulers and orchestrators, you can also then consider more advanced scheduling features, like hotspots.
  • Software-Defined – You can now deploy the environment everywhere within your infrastructure.

Another key benefit is business continuity and HA. Most application patterns have a dependency for storage to be available across different nodes for high availability, perhaps as well for business continuity. For example, if you’re using storage level replications to replicate across different availability zones or different data centers or different rocks within a data center.

With Cloud Native Storage, you have the availability to quickly recover your applications and your databases etc.

Docker Persistent Storage

Initially Docker persistent storage was really simple.

Docker Persistent Storage - Directory Mounts, Named Volumes, Volume Plugin

You simply mounted an individual directory from the host system. However, this wasn’t practical as the data was tied to the host it ran on. If the container moved to another node, the data wouldn’t be present.

The next evolution was named volumes. Now your volumes were referenceable, and you could compose them into your services. You could refer to a volume by name and there was a mapping to where the data resided.

Then the next evolution was volume plugins. Volume plugins gave Docker the ability to automatically reach out to external storage providers, and integrate them into the same infrastructure that named volumes within standards containers. A Docker volume plugin is integrated into Docker, is a fundamental part of Docker 1.10 onwards and effectively extends the ecosystem to allow you to use external storage providers. There are dozens of different plugins available.

With an external storage provider you get the ability to persist the data beyond the life of the host, because the data is no longer tied to just an individual host. You can now choose the storage provider that best meets the needs of your application.

One of the nice things about using the Docker volume plugin is that you can continue to use the standard Docker volume commands that you were using previously. The only difference is if you specify the –driver option to specify which plugin you’re going to use.  You now have the ability to create, delete, list, and mount volumes into containers, based on just that volume name.

StorageOS – A Docker Volume Plugin

StorageOS is a Docker volume plugin that enables persistent storage for Docker. StorageOS is a single container that includes both the data plane, which manages the data parts for our volume, and the control plane, which manages the cluster heads, the config and the policy for different volumes, and has all the API endpoints to integrate natively into the Docker plugin ecosystem, as well as into Kubernetes.

StorageOS Control Plane and Data Plane

A Docker volume plugin gives you all the API endpoints for Docker volume operations. You have an API between the Docker engine and the external provider to do things like create, mount, delete. The plugin allows you to dynamically create or mount the volume, which then the Docker engine uses to integrate into the Docker name space natively.

You can attach one or more volumes and specify the paths where you want those volumes mounted within the container. Those volumes now become part of the container namespace.

Docker Volume Plugin to Create Highly Available WordPress

As a little example, you can use a Docker volume plugin to create a highly available WordPress. It’s a really simple example where you have two services.

$ docker plugin install --alias storageos storageos/plugin
Plugin "storageos/plugin" is requesting the following privileges:
- network: [host}
- mount: [/var/lib/storageos]
- mount: [/dev]
- device: [/dev/fuse]
- allow-all-devices: [true]
- capabilities: [CAP_SYS_ADMIN]
Do you grant the above permissions? [y/N]

The first service is a database, which uses StorageOS as an external volume driver to mount a volume into /var/lib/mysql of a MySQL container.  It allows you to create a MySQL instance, which Swarm can schedule on any of the nodes within your cluster.

$ docker service create \
--mount
type=volume,src=db,dst=/var/lib/mysql,volume-driver=storageos \
--name db \
--replicas 1 \
--network wp \
--publish 3306:3306 \
--detach=true \
-e MYSQL_ROOT_PASSWORD=wordpress \
-e MYSQL_PASSWORD=wordpress \
-e MYSQL_USER=wordpress \
-e MYSQL_DATABASE=wordpress \
percona:5.7 \
--ignore-db-dir=lost+found

$ docker service create \
--name wp \
--network wp \
--publish 80:80 \
--mode global \
--detach=true \
-e WORDPRESS_DB_USER=wordpress \
-e WORDPRESS_DB_PASSWORD=wordpress \
-e WORDPRESS_DB_HOST=db:3306 \
-e WORDPRESS_DB_NAME=wordpress \
wordpress:latest

If your MySQL instance dies, Docker Swarm will  transparently restart it or move it to another node in the cluster. Docker Swarm will re-route networking transparently to the new instance.  This is a very simple example of what you can do when you have highly available databases that you’ve now enabled using a highly available storage layer.

A Few Things to Remember About Plugins

  • With an orchestrator like Kubernetes, the orchestrator will have an interface to manage the storage system and how volumes map to applications
  • Not all plugins are simple – many are just interfaces for other frameworks or subsystems
  • There are many options for storage with different use cases and not all storage systems provide the functionality needed by a cloud native microservices based application
  • Pre-provisioned volumes may provide some of the benefits, but not the flexibility in the long run
  • Diverse ecosystem: 95+ plugins, 5 interfaces, 6+ frameworks

There are many options for storage, and just because there’s a plugin for storage, doesn’t necessarily mean it supports all of the services that you might expect out of your Docker environment. For example, some legacy storage systems might not support things like dynamic provisioning or be able to move volumes easily across nodes.

A Diverse Ecosystem

We are operating in a diverse ecosystem with many plugins and different interfaces and lots of different frameworks. The great news is the CNCF and the different orchestrator platforms are working together to create a new standard called CSI, the container storage interface. CSI aims to standardize the methods storage orchestrators use to talk to different storage systems. Watch this space for more to come.

mm

Author: Alex Chircop

Experienced CTO with a focus on infrastructure engineering, architecture and strategy definition. Expert in designing innovative solutions based on a broad and deep understanding of a wide range of technology areas. Previously Global Head of Storage Platform Engineering at Goldman Sachs and Head of Infrastructure Platform Engineering at Nomura International.

Try for free