Surviving an AWS outage: Multi-Region Storage

On February 28 2017, Amazon’s S3 web-based storage service experienced widespread issues that caused disruption to thousands of websites and hosted services. According to the post-mortem, an incorrect command had a cascading impact on the S3 services across Amazon’s largest and most popular region. If that included your service, consider what changes you might make to your architecture and how StorageOS can help you cope with outages.

Understanding the impact of AWS outages

Architecting for reliability requires planning for region-wide outages.

All public cloud providers (AWS, Google Cloud and Microsoft Azure) experience serious outages every few years. And every time, their customers lose revenue, experience service disruption, and occasionally lose critical data. The cloud providers also lose revenue – this outage is expected to have a two percent impact on AWS’s first-quarter revenue.

In the most recent outage, it’s important to note that not all services were impacted. So how did some services survive with no disruption?

Historically, most AWS users have deployed S3 in a single region, due to the cost and complexity of multi-region architectures. Even though S3 touts 11 nines (99.999999999%) of durability, it still introduces a single point of failure. And a single region only provides an availability SLA of three nines (99.9%). If more than 99.9% availability is needed, you should consider a multi-region implementation to reduce the risk of outages disrupting your service.

How StorageOS Supports Multi-Region Services

StorageOS helps maintain service levels even during outages and makes disaster recovery seamless.

A multi-region service backed by StorageOS replicates data across regions. In the event of an outage, data is automatically served from another region through the process of failover. Instead of your application infrastructure needing complex updates, with StorageOS, data service remains available throughout and recovery is seamless.

The benefit is a reduced risk of service interruption and data loss.

Whether using a cloud provider or running on-prem, you should consider how region-wide or data center failures affect your services. StorageOS simplifies creating a multi-region architecture, making it much easier to avoid the disruption, cost and reputation loss suffered by companies during the latest outage.

Register for the beta now, and be the first in line to try StorageOS for free.


Author: Cheryl Hung

Cheryl Hung is the Director of Ecosystem at the Cloud Native Computing Foundation. Cheryl codes, writes and speaks about storage, containers and infrastructure. Cheryl previously worked at StorageOS as product manager and as a Google Maps software engineer, with particular expertise in mapping and geolocation services, C++, Java and Python. She graduated from the University of Cambridge with a Masters in Computer Science and has worked in London and New York.

  • Customer Case Study: StorageOS provides MSP, Civo with Cloud Native StorageRead Now

  • Blog: Using the RabbitMQ Kubernetes Operator with Persistent DataRead Now

  • Performance Benchmarking Cloud Native Storage Solutions for KubernetesDownload Now

  • Webinar: Register for Accelerating Kubernetes Onboarding and Application Transformation on 18th August, 2021 at 4pm (BST)Register Now