The success of any Data Warehouse (DWH) solution lies in its ability to not only analyze huge amounts of data over time but also to provide stakeholders and end-users meaningful options that are based on real-time data.
While details are elaborated below, it is essential that a good DWH test strategy should cover validation of loading of all required rows, correct execution of all transformations and successful completion of the cleansing operation. The team also needs to thoroughly test SQL queries, stored procedures or queries that produce aggregate or summary tables. Keeping in tune with emerging trends, it is also important for test teams to design and execute a set of test cases that are focused on customer experience.
Fig 1: Key components of an effective data warehouse test strategy
The focus of Data Warehouse test strategy is primarily on four key aspects:
- Data quality validation
- End user & BI / report testing
- Load and performance testing
- End-to-End (E2E) regression and integration testing
In this post, I’m going to elucidate on two of the four key aspects that data warehouse test strategy focuses on, data quality validation and end user & BI/report testing.
Data quality validation
This is core to any data warehouse tests and includes tests for data completeness, data transformation and data quality.
- Data completeness tests: Designed to verify if all the expected data loads into the data warehouse, ensuring that all records are completely loaded without errors in content quality or quantity.
- Data transformation tests: Designed to verify the accuracy of the transformation logic or transformation business rules.
- Data quality tests: Designed to validate system behavior when data is rejected (example: data inaccuracy or missing data) during data correction and substitution. Scenario-based tests and validation tests for the solutions’ reporting feature are part of this test.
Data quality validation should ensure:
- Extraction of data to the required fields
- Proper functioning of the extraction logic for each source system (historical and incremental loads)
- Security access to source systems for extraction scripts
- Updates to extract audit log and time stamping
- Completeness and accuracy of “source to extraction destination” transaction scripts, which are transforming the data as per the expected logic
- Historical & incremental load transformation for historical snap-shots
- Creation of relevant detailed and aggregated data sets
- Transaction audit log and time stamping
- No pilferage of data during transformation process and also during historical and incremental load
- Real-time or near-real time data loading without impacting performance adversely
- Temporary tables updating through multi-pass SQL statements with real-time or near-real time reporting and analytics
End user and BI / report testing
Extreme care should be taken while testing and the reports should be as clear and self-explanatory as possible. Usability, performance, data accuracy and preview and/or export to different formats are areas where most of the failures occur.
When designing tests for end user and BI / report testing, key points to address include:
- Data display on the business views and dashboard are as expected
- Users being able to see reports according to their user profile (authentication and authorization)
- Verification of report format and content by appropriate end users
- Verification of the accuracy and completeness of the scheduled reports
- OLAP, drill down report, cross tab report, parent / child report etc. are all working as expected
- ‘Analysis functions’ and ‘data analysis’ are working
- No pilferage of data between the source systems and the views
- Testing of replicated reports from old system to new system
- Previewing and/or exporting of reports to different formats such as spreadsheet, PDF, HTML, e-mail displays accurate and consistent data
- Print facility
- Where graphs and data in tabular format exist, both should reflect consistent data
In my following blog post I will be elaborating on the other two aspects, load and performance testing and end-to-end regression and integration testing.