In my previous post I elaborated on the key factors to be ensured in Data quality validation and end user & BI / report testing for an effective outcome. Data quality validation comprises of test for data completeness, data transformation and data quality. In this post I shall explain about the other two aspects that Data Warehouse Test strategy focuses on.
Load and performance testing
With an increase in the volume of data, stability and scalability become critical test parameters. Under stress from large transactional data volumes, data warehouses will typically not scale, and will eventually fail, unless they are tested and issues are fixed. To avoid such problems, the test team must design and execute series of tests under different loads. As part of this activity, the following tests can be executed:
- Shutdown the server during batch process and validate the result
- Perform ETL with load that is twice or thrice the maximum possible imagined data (for which the capacity is planned)
- Run huge volumes of ad-hoc queries mimicked from multiple users simultaneously
- Run large numbers of scheduled reports
- Monitor the timing of the reject processes and check system behavior when handling large volumes of rejected data
E2E integration and regression testing
Integration tests show how the application fits into the overall flow of all upstream and downstream applications. When designing Integration Tests, the focus of the tester should be on:
- How the overall process can break and focus on the integrations between different systems and their subsystems
- Validating system behavior when different types of data (different user profiles, different data types, different data volumes etc.) get processed and communication to the subsequent system
- Running custom-designed regression tests that simulate end user behavior (ensures success of user-acceptance tests).
Usage of techniques like scenarios based testing, risk based testing and model based testing will enhance the effectiveness of testing.
To sum it up, it is important for test teams to understand that testing a Data Warehouse implementation is a different ball game. Since a Data Warehouse primarily deals with data, a major portion of the test effort is spent on planning, designing and executing tests that are data oriented. These tests include running SQL queries, validating that ETL executes as expected, exceptions are handled effectively, application performance meets the SLAs and finally, ensuring that the integration points are working as expected. It will also be helpful if the team members have experience in debugging performance bottlenecks.
Another dimension of Data Warehouse testing is the dependency of the tests on the test environment. Since it is a known fact that in general, a test environment will not be robust (high end servers, clustering, load balancing, data volumes and data accuracy), this will have an impact on some of the tests. For example, simulated or masked data that might not be reflecting all the characters of production data may restrict the accuracy of performance tests. In other cases, some of the jobs may not fail under simulated test environment. Test teams should be wary of such limitations.
Detecting all possible defects may sometimes be complex, but planning will best help in identifying the most obvious and costly defects early in the life cycle. Finally, a group of Data Warehouse Architects, Business Analysts and Test teams working together during the initial planning and design phase is one of the time tested approaches that can help in identifying and eliminating potential failures.
I hope I have covered all the important aspects of DWH testing. Going further, readers are most welcome to share their views, experiences and other insights that I may not have covered in my blogs.