DevOps in the Trenches: Get Started with Metrics
While it is nice when an enterprise recognizes the value of getting everyone working together in an end-to-end value stream, the reality is that’s not where most DevOps initiatives start. Often, one particular silo decides there is value in working more closely with others and seeks ways to do so.
I call this “DevOps in the trenches,” and while it’s not ideal, it is a way to get DevOps bootstrapped and gain some benefits from a more collaborative delivery process.
Here are some tips for how to get started doing DevOps in the trenches, with key DevOps metrics to help.
If you are in software development, work with testing to decrease the cycle time it takes to build and test changes. Benchmark how long it takes for a change to get through your build and test process, and then brainstorm with your testing organization about what you ultimately want your cycle time to be.
Use this metric as a forcing function to discuss progress, either during one of your daily standups or at a quick meeting each week. Help improve this number by putting in place a robust continuous integration capability that incorporates appropriate testing (static and dynamic) into the many builds you should be doing each day.
If your test organization does not have automation skills, dedicate time for someone on your development team to serve as a software development engineer in test (SDET), and help them create a maintainable regression suite that can be run frequently. Also build a simple smoke test for your application that allows testers to validate that a new build works well enough for them to spend time testing it.
If you are in software testing, work with operations to decrease the failure rate of your application in production. Often a root cause of these failures is the manual, untestable, poorly documented procedures for setting up production environments and installing and configuring an application for use.
Benchmark how frequently such deployments fail in the various environments your application is set up in—QA, staging, and production. Ask operations if you can “test” these deployment procedures and more clearly specify the steps involved.
Since testing production deployments can impact production quality, do these tests in any production-like environment, like staging. Then have a tester sit with operations the next time they do a deployment and observe how testable the production deployment process is.
Meet regularly with operations to discuss the deployment failure rate you are tracking and other ways to improve the testability of your deployments.
If you are in operations, work with development to decrease the time it takes to fix defects found in production. Doing so will begin to reduce the mean time to repair (MTTR) an application when it fails.
Benchmark how long it takes tickets to be closed, and use this number to frequently discuss with development ways to better collaborate to restore service quicker. Evaluate your ticketing process and look for inefficiencies, points of confusion, and queues where progress is blocked behind a process bottleneck, and seek to reduce or remove them. Invite someone from development to a weekly meeting to discuss outstanding tickets and progress. Review the time it is taking to close tickets, whether it is trending up or down, and other ways to improve the process.