Are Your Metrics Causing Unintended Consequences?
Experiments at the Hawthorne Works, an electrical engineering factory, from 1924 to 1932 showed that workers seem to be more productive when they are observed. Originally the researchers were interested in determining what (if any) effect light had on worker productivity. Initial experiments indicated productivity improved with higher levels of light, but work levels slumped when the study ended.
Researchers determined that their concern for the workers made the workers feel what they were doing was important, and this was what caused productivity to improve. The results were dubbed the Hawthorne effect.
Here is an unscientific expansion of the Hawthorne effect based on my personal observations (and probably yours as well): When you collect metrics that involve people, it will change the way they behave—but not always for the better.
Example 1: A test manager in one of my classes told me her boss demanded that she measure the performance of her staff based on objective metrics. Initially she decided to do this by “grading” the performance of her staff by the number of test cases they wrote. I bet you can imagine what happened next—she was inundated with test cases. The testing was not better; employees just wrote more (duplicate or often ineffective) test cases. I have heard similar stories where the testers were measured by how many bugs they found.
Example 2: The defect detection percentage (DDP) essentially measures what percentage of bugs are found by a test or QA group. When used correctly, it can give an indication of test effectiveness. (Some people would disagree, but that is an entirely different article.) Once instituted, though, many test managers feel the metric is grading their effort. In order to have a “better” DDP, the test managers found ways to justify not counting certain escapes, e.g., “We didn’t find that bug because our test environment was inadequate, so it won’t count against us as a missed bug.” So they end up with a very high DDP, but the metric has little or no value.
Example 3: The Hawthorne effect is not limited to software. The problems within the Department of Veterans Affairs came about at least partially because the administrators were trying to reach a goal of providing service within a set timeframe. To meet requirements for performance incentives, they manipulated patient wait-time data, resulting in a scandal and total overhaul of the department.
Example 4: I am co-owner of a restaurant called Maddogs and Englishmen. Once, several years ago, we had a contest among the servers with a prize for whoever could sell the most of a particular wine. In retrospect, the results were predictable: The servers virtually refused to sell the customer anything but the one type of wine involved in the contest.
The message is clear: Every time you choose to collect a metric, you should try to analyze what the unintended consequences might be and proceed with your eyes wide open.
Rick Craig is presenting the tutorials Measurement and Metrics for Test Managers and Essential Test Management and Planning at STARWEST, from October 12–17, 2014.