What does database testing really mean?
Database testing is an important aspect of data integrity. It ensures the accuracy, consistency, and currency of your information by making sure that all records are loaded into target tables without any loss or change during the migration. When you load source databases with new records while also checking for potential errors in design before they happen.
This method helps us confirm transformations applied on resultant objects after running our queries against those same sources so there won’t be surprises later down the road!
Validating data in the tables becomes very cumbersome if we do it manually. Apart from the longer duration ensuring the accuracy of the data being validated is the biggest challenge. When data loads, there is a chance that we might need to validate it each time. This can be tedious and repetitive in nature as well since validation of the records happens only during the QA process, but you’re guaranteed an increase for any unwanted listings such as nulls or duplicates which could get into your table if not careful enough with loading properly!
The best way to tackle this scenario is to automate this task which will ensure the data integrity is intact, duplicates are omitted, and the task is completed in a fraction of the time than it used to be.
To achieve this, we can utilize the rundeck application to include SQL.
Rundeck is an open-source automation service with a web console, command line tools, and a WebAPI. It lets you run automation tasks across a set of nodes, easily. Every time a table is loaded with new data, the SQL scripts will be run. Once the batch closes these scripts having SQL queries to validate the counts and data, it will catch the data discrepancy, if any, and will send a notification via email.
Let’s see a practical example of how the scripts can be included after the loading job completes by choosing the command option. Before adding the data validation SQL script, the script should be pushed from the local repo through GitHub.
The sample piece of the script below compares the data count between the source and target and if there is any data mismatch then emails will be triggered.
Example of a script that checks the nulls in the columns:
Example of a script that checks the duplicates in the columns:
While setting up your script you may consider the below points:
- Connection to both source & target DBs should be configured with the rundeck application or should be included in the script e.g.:
Conn_a: to connect with sourcedb
Conn_b: to connect with targetdb
Lable: Check the name that we can specify
Type: Should be configured with the rundeck using SQLRunner plugin.
SQLRunner Plugin is a node step plugin included with rundeck that executes a SQL script.
Warning threshold: If the count difference crosses 100 then the email received subject will be with warning for so and so table
Failure threshold: If the count difference crosses 1000 then the email received subject will be with warning for so and so table