Cloud Prediction: High Availability with a Chance of Downtime

By Beth Cohen - August 22, 2013

Selling disaster recovery services in New York used to be easy. But almost twelve years after 9/11, have business continuity and disaster recovery lost their place in enterprise IT budgets?

In the past, disaster recovery meant either paying for expensive data replication services to remote data center hot sites or purchasing full sets of redundant hardware. While plenty of companies with deep IT pockets were willing to pay for that peace of mind, others went looking for more cost-effective approaches to achieving the desired objective of minimizing downtime and data loss.

As cloud services have gained popularity, the dirty secret is that cloud services are relatively cheap because they are built for economies of scale and not for customer high availability or reliability. The service level agreements, if they exist at all, are shockingly bad. A 99.95 percent uptime guarantee, the Amazon Web Services (AWS) standard offering, might not seem so awful, but that translates to a downtime of seven minutes a month.

When you read the fine print and factor in that the service level agreement ONLY applies inside the provider’s data center, it quickly becomes apparent that cloud services are rife with single points of failure and potential outages. Just because application instances are working in the data center does not mean customers can access them.

In response, some companies are creating new more cost-effective architectures that address the needs for both resiliency and high availability. The assumption is that if applications are architected correctly, there is no need for disaster recovery because the services are designed to be resistant to failure.

Netflix famously created Chaos Monkey, a tool that randomly kills instances and other system components as they migrated out of their private data centers into AWS. In July 2012, it was released as an open source tool on Amazon Elastic Compute Cloud (EC2).

Improving application resiliency solves only part of the problem because anything crossing the Internet is still at risk for outages. Companies that have come to rely on cloud-based business critical applications need to start pressuring providers to package their services with reliable end to end networks so they can deliver real SLAs from the data center all the way out to user endpoints, wherever they may be.

In conclusion, consumer products such as Netflix have been able to get away with delivering occasional degraded service because more often than not users will just shrug and blame it on the Internet. Cloud application vendors targeting businesses, such as supply-chain or financial services, cannot afford to be so sanguine about the networks outside their control.

By bundling cloud services with more reliable network access, companies can gain the cost and flexibility advantages of the cloud, plus the peace of mind from knowing the service will in fact be delivered—guaranteed.

Tags:

cloud

netflix

Up Next

Two Reasons Why Test Automation Projects Fail

August 22, 2013

Get TechWell Insights Delivered Weekly

All TechWell Insights by this Author

Related Insights

About the Author

Beth Cohen

Beth Cohen is a cloud strategist for Verizon, helping to develop cutting-edge products for the next generation. Previously, Beth was president of Luth Computer Specialists, an independent consultancy specializing in cloud-focused solutions to help enterprises leverage the efficiencies of cloud architectures and technologies, a senior cloud architect with Cloud Technology Partners, and the director of engineering IT for BBN Corporation, where she was involved with the initial development of the Internet and worked on some of the hottest networking and web technology protocols in their infancy.

Cloud Prediction: High Availability with a Chance of Downtime

Up Next

About the Author

Beth Cohen

TechWell Insights To Go