Do you test with DATA or does it test you?

In the words of Sir Arthur Conan Doyle – “It is a capital mistake to theorize before one has data. Insensibly one begins to twist facts to suit theories, instead of theories to suit facts”. Godfather of modern day software testing? Maybe not, but the quote accurately sums up the testing scenario today.

Quality at speed is the mantra for testing success as the clamor for faster product launches increase. However, testers often struggle to meet the stringent timelines. Incomprehensive testing and inaccurate reporting of test completion are usually the fallouts of such time pressures. Another common pitfall is using low volume fictitious data — which might not have any resemblance to actual production data, thereby impacting test efficiency and eventual production failures.



So, how can testers deliver the best in the least amount of time? Relying on development teams (programmers and DBA’s) for data is not a solution as the developers have their own priorities. Data from development teams might not contain the right mix of positive and negative test data against a specific test scenario. In addition, developers clearly lack a tester’s mindset that ponders on the “What if?” when designing tests.

Masking production data, is a solution used by many companies. While better than random test data, masked data is severely limited by

  • Regulatory compliance that makes access to production data almost a “no – no” (even with masking)
  • Differences between the test environment (generally a bare bone version of the production environment) and production environment (integrated with 3rd party applications and high end hardware) resulting in limited reliability in quality and volume of production data

It is not all bad news though. Many commercial solutions available in the market can generate synthetic test data. The only grouse is the cost. While large organizations with big IT budgets may afford these tools, medium and small organizations will need to look for better and cheaper ways to generate test data.

When considering test data management solutions, organizations should evaluate:

  • The amortized cost of test data creation (tool license cost spread over number of times the tool is used)
  • The ability to seamless integrate with existing test automation scripts and test management tools (should not result in automation code modification in order to integrate)
  • The need to have a database expert to operate the tool (people with minimal SQL skills to understand the DB structure)
  • Customization capabilities like adding fields that may not be part of the original DB schema (example: country specific data and support data containment and maintain referential integrity

In my next blog, I will discuss a few best practices in creating the right test data, and selecting test data management solutions.