Service Providers Need to Step Up When Handling Web Outages
Global Internet penetration has grown leaps and bounds in the past decade. Usage varies from casual browsing all the way to serious multibillion-dollar businesses that are run online. Such a wide range in usage brings in a wide range of impact when there are web outages—from just a mild inconvenience to several million dollars in lost business.
Service providers—software, platform, and infrastructure—are working on new strategies every day to mitigate the instances and impact of such outages when they do happen. Tighter service level agreements are being enforced to further provide assured service with increased levels of reliability. We’ve recently seen the extent of impact when services such as Google fail for just five minutes or when Amazon’s site is down for forty minutes.
The Amazon outages in the past year are not new news items. The Amazon Web Services failure in June 2012 raised a lot of concerns because of the number of businesses that were impacted. The failure forced the businesses to think hard about alternate solutions to avoid such scenarios in the future. A multi-cloud deployment is a popular plan that businesses are beginning to consider.
There are other simpler proactive steps that can be taken, including using monitoring tools, doing individual deployments for specific customers, and being transparent with customers in case of such downtimes. What is interesting is that we live in an age where not only the service providers look for backup plans and mitigation strategies but the service receivers also have an equally important role in defining the business’s backup strategy.
Obviously these solutions are not always easily afforded. Sometimes the cost of engineering a backup strategy is more expensive than a service downtime of one to two days. That said, you may be a medical group, a bank, a stock exchange firm, or a high-traffic online commerce portal that cannot afford a downtime of even a few hours. It becomes a very customized and individual call as to what your tolerance level for such outages is and what your dependency factor on your service provider is.
While the service providers obviously continue to work on better engineering practices, one area that they are often quoted as falling short in is transparency in communicating issues to their customers. Providers like Amazon have externally facing sites that list service availability, but how they work and what level of detail they provide in the case of an outage is the question.
Every outage provides several lessons to all entities involved on how to mitigate them at an engineering and financial level and how to hold up the trust and goodwill built over years. At the end of the day, it is that enterprise trust that keeps the relationship going strong even amid such failures—even for large players like Amazon.