Handling a Check Failure in Test Automation

What happens on your team when a check (what some call “automated test”) fails?

If you follow the common practice of automating manual test cases, then the authors of the automation are expected to follow up and diagnose the problem. Usually, it’s a failure in the automation itself—everybody on the team knows that—so it doesn’t get much priority or respect generally, but even if it does, it’s very time-consuming and labor-intensive to resolve the failure.

Alternatively, the author of the automation watches it proceed to see if something goes wrong. That shortens the communication chain, but it’s still very time-consuming and expensive, and it doesn’t scale at all.

Regression tests or checks that are effective toward managing quality risk must be capable of sending action items outside the test/QA team quickly. False positives, such as messages on quality issues that ultimately turn out to not concern the product at all, are wasteful, and they corrode trust in the test/QA team. Quality communications must be quick and trustworthy for test/QA to be effective.

On check failure, people can look at flight-recorder logs that lead up to a point of failure, but logs tend to be uneven in quality, verbose, and not workable for automated parsing. A person has to study them for them to have any value, so the onus is on test/QA again to follow up. Bottom-up testing, or testing at the service or API layer, helps, but the problem of uneven log quality remains. Mixing presentation with the data, e.g., English grammar or HTML, bloats the logs.

Imagine, instead, an artifact of pure structured data, dense and succinct, whether the check passes or not. Steps are self-documenting in a hierarchy that reflects the code, whether they pass, fail, or are blocked by an earlier failure.

The pattern language MetaAutomation puts all this information in efficient, pure data with a schema, even if the check needs to be run across multiple machines or application layers.

A failed check can be retried immediately, and, on failure, the second result will be compared in detail to the first. Transient failures are avoided, and persistent failures are reproduced. Automated analysis can determine whether the failure is internal or external to the project, and even find a responsible developer in the product or test role as needed.

If so configured, a product dev would receive an email if a) the exact failure was reproduced, and b) the check step, stack trace, and any other data added by check code indicates ownership.

Atomic Check shows how to run end-to-end regression tests so fast and reliably, they can be run as check-in gates in large numbers. Check failures are so detailed, the probability that a problem needs to be reproduced is small.

This way, communications outside test/QA are both quick and trustworthy—and, therefore, effective.

Matt Griscom will be presenting his session MetaAutomation: Five Patterns for Test Automation at STARWEST 2015, from September 27–October 2 in Anaheim, California.

Up Next

About the Author

TechWell Insights To Go

(* Required fields)

Get the latest stories delivered to your inbox every month.