This is the third blog in the four part series dealing with Business Intelligence Testing.
Before we begin talking about what ETL testing is, let’s recap the concepts of business intelligence and data warehouse testing (DW test).
Business intelligence is a process wherein business data is collected and turned into useful information for an organization. This data consists of records of a company’s daily transactions, such as its customer interactions, employee management, and finance administration.
Meanwhile, data warehouse testing is the process of developing and carrying out specific test cases that ensures all data in a warehouse is in line with that of the organization’s framework.
As for ETL testing, this approach is a sub-component of data warehouse testing. It is used to process extracted data, which is transformed according to business intelligence requirements, and then loaded into a designated data warehouse.
Steps in the ETL Process
These are the steps testers take when using the ETL testing process:
- Extract: The first step in the ETL performance testing approach is to extract the information from various data sources. The data sources that can be used are often third-party databases like MS SQL and Oracle DB. A CSV file can also be used to extract data.
- Transform: Once the data has been extracted, the next step is to transform it into usable schematic data by way of a cleansing operation. This is the part where any incomplete or inaccurate records are removed from the database.
- Load: The schematic data obtained from the previous step is then uploaded into an Online Analytical Processing (OLAP) data warehouse. The data will be stored in this warehouse and can be used later on for business intelligence purposes or for further analysis.
What To Test in ETL
Before collated data can be used for business intelligence, it must first be validated to ensure that no defects are present. Identifying data issues is the primary goal in ETL testing.
These are some of the common cases that are being tested in ETL:
- Data mapping: The most vital test case in ETL testing is data mapping since it is done to ensure that the data obtained from the sources are relevant to the target database. If there is any mismatch, the system fails.
- Data schema validation: This test case ensures that the data schema acquired from the source needs to match that from the database.
- Searching for inaccurate or duplicate data: The target database should not have duplicate or incomplete data, which is why it is important to test data accuracy as well.
- Verifying business rules: The data uploaded in the target database should comply with the applied business rules.
- Testing performance: This type of test case is also crucial as some forms of data can negatively affect the performance of the system.
- Testing rows and table counts: Data from all rows and tables should match that of the target database. Any mismatches could lead to potential bugs in the system.
How To Write ETL Test Cases
The concept of ETL testing applies to different databases and tools within the information management sector. Since the objective of ETL testing is to ensure that the data from a source is accurate, it is only normal for information to be verified at various stages.
As users perform ETL testing, these two documents always come in handy:
- ETL mapping sheets: This contains all the data regarding destination and source tables, including the necessary columns and the reference tables.
- DB schema of source, target: This document is always kept ready as it is used for verifying any information within the mapping sheets.
With that said, these are the most prevalent ETL test scenarios and test cases used today:
- Mapping doc validation: The mapping doc is validated whether the respective ETL details are provided or not.
- Validation: The source and target table structure are verified against the mapping doc while the target data type and source data type should be similar.
- Constraint validation: This is to make sure that the constraints are defined for a specific table.
- Data quality: Number, date, precision, data, and null checks are made.
- Date validation: This is done to identify active records according to the ETL development perspective.
- Data cleanliness: All unnecessary columns have to be removed before being loaded into the staging area.
- Duplicate check: The unique key, primary key, and columns should be unique based on the business requirements.
8 Stages of the ETL Testing Process
Good ETL testing is capable of identifying issues, inconsistencies, and ambiguities with the data source as early as possible. The whole process can be broken down into the following stages:
- Identify business requirements: This is where the design, business flow, and reporting needs are assessed according to client expectations. Identifying business requirements is important because it helps define a clear scope of the project.
- Validate data sources: A data count check is done and the table and column data is verified to see if they meet the data model’s specifications. This is also to ensure that check keys are all in place while any duplicate data is removed.
- Design test cases: This is the stage where ETL mapping scenarios are designed. SQL scripts are also created here and transformational rules are defined.
- Extract data from source systems: The ETL tests are done according to business requirements. Bugs or defects are identified during testing and testers generate a report afterward.
- Apply transformation logic: This is to ensure that the data is transformed to fit the schema of the target data warehouse.
- Load data into target warehouse: A record count check is done before and after the data has been moved from the staging area to the data warehouse.
- Summary report: This is the stage where the layout, options, and filters are verified, as well as export functionality of the summary report.
- Test closure: Once all stages have been completed, testers then file a test closure to end testing.
Types of ETL Tests
There are nine types of ETL tests that testers can perform. These are:
- Production validation: Also known as production reconciliation, this type of ETL performance testing approach verifies data in production systems and then compares them against the data source.
- Source to target testing: This type of test validates the number of records that have been loaded within the target database to match the record count.
- Source to target data testing: This is performed to ensure the projected data is included within the target system without truncation or loss. It also ensures that the data values meet all expectations after transformation.
- Metadata testing: Carries out data type, index, length, and constraint checks of the ETL application metadata. Data like reconciliation totals and load statistics are assessed here.
- Performance testing: Ensures that the data is being loaded within the data warehouse according to expected time frames. The response of the test server to multiple transactions and users is also tested to make sure they are adequate and scalable.
- Data transformation testing: SQL queries are carried out for this test to validate that the data is transformed correctly.
- Data quality testing: Syntax tests are performed to ensure that the ETL application rejects and reports on invalid data.
- Data integration testing: Verifies that the data from all sources have been correctly loaded to the data warehouse.
- Report testing: This type of testing reviews the data in the summary report and verifies layout and functionality as expected.
Performance Testing in ETL
This is a testing method that is performed to ensure that the ETL system is capable of handling the load from multiple users and transactions. Its primary goal is to improve and optimize session performance by identifying and eliminating any performance bottlenecks.
Informatica is one of the most prevalent tools used in performance testing and tuning.
ETL Testing Tools
With all those considered, here are a few ETL testing tools that are being used today:
- QuerySurge: QuerySurge is a popular ETL testing tool that allows users to perform the test process automatically. It can support various CI/CD processes and cloud databases.
- Informatica Data Validation: Informatica is an excellent tool for ETL testing as it makes the process easier for people with limited coding skills. The tool offers an intuitive and user-friendly interface, which is why it is one of the most popular ETL testing systems today.
- Datagaps: Another great ETL testing tool is Datagaps as it is capable of performing data extractions and test case executions at the same time.
As you can see, ETL testing is performed to compare how the data in a target database performs and functions against that of the source database. That is why understanding how the source data works is vital when performing the ETL data testing process.
Failing to understand the source data along with its business purpose can result in an unsuccessful ETL testing process. For engineers, their SQL skills are going to be put to the test throughout the ETL testing.
Although it can be quite challenging, ETL testing is a significantly important process that is necessary for any major enterprise application.
Download your free checklist below and discover the steps that need to be completed when preparing for performance testing.