Introduction: What is automation and why is it so important?
We are in the era of Artificial intelligence, which paves the way for minimal human interference in many real-world activities. This artificial intelligence has evolved from the grassroots of the automation thought process. The key differentiating factor of AI is not just to repeat the same work, but it attempts to keep evolving with its cognitive behavior as a human being.
However, the process of automation is ultimately focused towards the same goal of making our life easy by automating error-prone manual activity in order to reduce the cost, time, and effort with more accuracy!!
Automation testing is all about doing the validation of functional systems via scripts/tools instead of doing it with human effort. The key success of automation testing relies on the stability and scalability of the automation framework. The framework creates the platform for running the automation show.
There are several automation testing frameworks and tools out there in markets both licensed UFT, TOSCA, and the open source community such as selenium, cucumber, etc.
Automation testing of data platform – What makes it different from a web application approach?
The automation process has been phenomenally established in the software testing industry. But the growth has significantly been achieved on Web applications. However, the journey of automation in data platforms has not been explored much because of some practical difficulties involved in it.
Automating the data platform validation, especially the DW / Data lake, has always been challenging for several reasons, such as
- Size of the data being handled (Volume)
- Heterogeneous data from multiple sources (Variety & Veracity)
- Fast Dynamics of data (velocity)
Hence, the automation of a data platform can not be perceived in a similar way to automating the web-based application where the UI will be the driving factor to running the show and the amount of transaction data is comparatively less in size in comparison to the analytical platforms.
Automation strategy for data -Best practices involved in it!!
The success of automation in data platforms relies on the framework and strategy, also to arrive at the scope of what can be automated, and what can not be automated.
The general thumb rule of automation is to start automating the regression/repetitive cases and scale it further to the extent of what can be automated considering the nature of data being handled in the process.
In the agile environment, the CI/CD – continuous integration & continuous development has been the trend to get the solutions quicker to production with maximum built quality.
In this trend, the room space for time-consuming manual validation seems to hinder quicker releases. This enforces the need for an efficient test automation process in place as part of the CI/CD.
But keep in mind this will not replace the complete functional validation of new enhancements. Here, the focus will be automating the smoke/regression testing (i.e. repetitive validation).
Hence, the success of automation testing is being driven by arriving at those list of test cases/ validation points which are feasible for automation
What can be considered in the scope of the Data automation suite?
- Metadata validation of the table
- Count reconciliation between source and tables for the initial load
- Duplicate check validation
- Key column validation
- Null/Not null constraint validation
- SCD – slowly changing dimension validation (in the case of Data warehouse)
- Data integrity between facts and dimensions (in the case of Data warehouse)
- Partition logic validation (in case of Data lake)
Challenges in automating the data platform – Hence, what can not be automated?
In DW architecture, a complex ETL process will be involved with data from multiple input data sources, hence creating a robust automation script to handle this complex transformation will be challenging and time-consuming.
Data quality is the key factor in any solution like DW or Data lake, which may not be a finite list to include in an automation script.
As discussed earlier, the volume will always be a challenge. Though there are powerful libraries and tools out there in the market to handle huge volumes of data, it is still one of the important challenging factors.
Conclusion: Start Automation testing on your data platform
Automation is always a crucial component in the software testing industry. However, for data-related solutions, automation testing is still on its way behind in comparison to web-based applications. It is true that there are several limitations in place that are the cause for such sloppy evolution in this process.
We can design an efficient testing framework for data platforms by picking the right test cases to be automated in scope along with the right choice of tools.
We hope this discussion forum will ignite an idea about the automation of data platforms.