Preparing to Test for Big Data
One of the latest buzz words in the IT industry is big data. Although still in its infancy with enterprises attempting to gain maturity in big data engineering, the potential it promises is huge with a forecast of $1–1.5 billion in industry revenue by 2015.
Both small and large organizations are grappling to build the required expertise, skill set, and infrastructure facilities to leverage big data to positively impact their businesses. Driven by business rules, complex algorithms, market analyses, and continuous self-learning among both machines and humans, big data is an area that can potentially have significant adverse effects when implementation fails. This puts additional pressure on the testing team to effectively prepare for a big data test effort.
Here are the key points that will set a testing team on a path to success:
- Leverage existing knowledge around data warehousing and business intelligence testing but understand there are significant differences between these and big data. Use these (be it testers, tools, existing test artifacts) as required to build expertise around unstructured and dynamic data testing. To realize opportunities, recognize the challenges of big data solutioning and attempt to mitigate them through your test strategy rather than mending an existing database testing solution.
- Unlike traditional test efforts, understand that in most cases the test team will not have access to end users. This is a core challenge as the tester herein simulates user behavior without much interaction with them. Look for data sampling wherever possible, including any call logs, user feedback and usage patterns, and collaborative filters from previous releases, keeping in mind that ongoing brainstorming with the product team is important in arriving at test scenarios and an optimized test matrix.
- Collaborate with developers and architects to learn big data technology and gain a leading edge. While traditional RDBMS testing is not going away, when you get to big data testing you are talking about file systems, dynamic and unstructured data, and newer technology jargon, concepts, and implementation strategies such as HADOOP, NoSQL, and Map Reduce. Look for group training opportunities to learn along with the rest of the team and be on par with them.
- Participate upfront while algorithms are being defined as you have tremendous learning as well as validation opportunities at that stage. Big data testing calls for more validations than verifications both from a testing possibility angle as well as a testing ROI angle.
- Work with your team to re-define your role to deliver a quality service that helps your organization monetize and maximize on the available data. There are newer roles, such as data scientists, statisticians, visualization engineers, and data curators that are necessary for big data engineering, and testers themselves will have to serve in some of these roles to fulfill their testing responsibilities. Work with your product team and the array of people on board to define roles and have a clear understanding of your position within the team.