Hystrix: Open Source Cloud Resiliency Release from Netflix
In a cloud environment, service failure is inevitable, and teams must augment traditional application infrastructure with new, cloud-aware components.
At Netflix, Hystrix protects “tens of billions of thread-isolated and hundreds of billions of semaphore-isolated calls.” Hystrix is an open source library that improves cloud resiliency and fault tolerance by preventing cascading failures, isolating downed services, and rerouting service connections.
Protecting applications with Hystrix requires re-engineering method calls, adding fallback options, and carefully configuring thread pools. Developers can choose to chain actions, which may fail fast, fail silently, or invoke custom fallback code. Fallback code can emit monitoring events, perform recovery actions, and trigger circuit breakers.
Hystrix provides the scaffolding and development guiderails for the classic fault tolerance scenario relying on primary and secondary services with fallback action. The Hystrix library extends service actions using the HystrixCommand object. A façade call is placed in front of the primary and secondary options, enabling a recovery action if both services fail.
Building resilient cloud applications requires coding defensively against resource starvation and locks. Careful attention to service method isolation increases resiliency. Developers may use the Hystrix framework to associate commands with logical groups and thread level pools. Thread pools may hold monitoring, metrics publishing, and caching services.
When 0.3 percent downtime translates into 3 million failures, automated circuit breakers enable Netflix to bypass downed services and maintain a fast user experience. Circuit breakers create a control point in the cloud architecture. The environment reports service call success, failures, rejections, and timeouts to circuit breakers.
The breakers maintain a rolling set of statistics and determine when the circuit should trip. When tripped, service requests are short circuited and immediately flow down an alternate path. After a recovery period, the circuit breaker resets, and the system attempts the preferred service call path.
Netflix may soon release their circuit breaker dashboard that visually depicts resiliency in action.
Hystrix joins a growing list of Netflix sponsored open source cloud projects. Netflix cloud projects add configuration management, application deployment, cloud management, autoscaling, fast asynchronous logging, change management, service registries, instance monitoring, and application monitoring onto the base Amazon Web Service (AWS) Infrastructure as a Service (IaaS) offerings.
You can track Netflix Open Source Software announcements by following @NetflixOSS on Twitter.