Creating a Test Strategy and Design for Testing Data
The information age. The digital revolution. The era of big data. Whatever label we put on it, there is an explosion of data and information available for planning, assessing, tracking, marketing, and helping us navigate and organize our daily lives.
From a tester’s perspective, the challenge remains: How do we address the seemingly endless data challenges in testing? These days, data comes from multiple sources, is transformed in many different ways, and is consumed by hundreds of other systems, so we must validate more data, more quickly, across heterogeneous platforms.
The first challenge of testing data is acquiring the necessary skills. It seems we must be part data scientist, domain expert, programmer, mathematician, data analyst, and database developer to develop and implement an effective test strategy for data testing. Testers must become more technical to keep up, and learning more about data should be an element of our technical arsenal.
The second challenge is creating test strategy and test design approaches for effectively and efficiently verifying and validating large quantities and types of data. Here is my "work in progress" checklist for things to consider when developing a test strategy and test design approach for data.
Data architecture: What input can testing provide to the data system design?
Technologies: What are the underlying technologies in which the data resides? Massively parallel processing (MPP) databases, data warehouses, distributed file systems and cloud computing platforms, clusters, Hadoop, ETL (extract, transform, load) capabilities, and the Internet are just some possibilities.
Test planning data: What data is available to drive testing? Are there help desk records, production information, user experience data?
Test data environment: What automation, virtualization, and test data tools are required to maintain, analyze, monitor, and report on data testing?
Test data selection: Is a sampling strategy sufficient, or is comprehensive data testing coverage required?
Data variety: What depth and breadth of the data types are to be tested?
Data integrity: What needs to be done before testing to check the quality of data, such as data conformity, accuracy, duplication, consistency, validity, and completeness?
Data performance: What aspects of data performance need to be evaluated? Data consumption speed, throughput, processing time, concurrency, caching, queuing … ?
Data flow: What are the data sources, what transformations of the data take place, and where does the data end up?
Data compliance: Are there internal or external data standards that must be validated for a specific industry, such as the Basel III Banking standard, ANSI, ISO, or IEEE?
Data security: What are the security requirements for the data? How will the data used in testing be sanitized or masked?
Data processing: Is the data batched, real-time, interactive, or a combination?
Test data management: How will the data be housed, managed, and maintained?
Data failover: What are the requirements for testing data failover?
Data structure: Is the data structured (static schema) or unstructured (dynamic schema)? Normally the bigger the data, the more we need to work with unstructured or semi-structured data.
We have moved from megabytes to terabytes, and now, with big data, petabytes. A well-formed test strategy and test design—and the skills to carry it out—is increasingly critical for today’s testers.