A virtual data pipe is a set of processes that take raw data from different sources, transform it into an format that can be utilized by applications, and store it in a destination like databases. The workflow can be programmed to run according to a timetable or upon demand. It can be complex with a number of steps and dependencies. It should be easy to track the relationships between each step to ensure that it is working correctly.
Once the data has been taken in, a few initial cleansing and validation occurs. It may also be transformed secure and organized environment using processes such as normalization and enrichment aggregation filtering as well as masking. This is an essential step, since it guarantees that only the most reliable and accurate data is used for analytics and application use.
The data is then consolidated before being moved to the final storage location and can be accessible to analyze. It could be a database that has an established structure, for example an data warehouse, or a data lake that is less structured.
To accelerate deployment and increase business intelligence, it’s often desirable to use an hybrid architecture in which data is transferred between on-premises and cloud storage. IBM Virtual Data Pipeline is an ideal choice for this, since it offers a multi-cloud copy solution that allows development and testing environments to be separated. VDP uses snapshots and changed-block tracking to capture application-consistent copies of data and provides them for developers through a self-service interface.