In a typical/traditional ETL or data warehouse solution, you need to ingest data into your data lake from various source systems and cleanse them before they can be processed further by downstream applications. Additionally, in current scenario data migration from on-premise systems to cloud has been becoming more and more popular.
Virtusa as a strategic business partner initiated the process with migration of client’s existing data acquisition framework using Hadoop modernization techniques on AWS. In addition, we used Talend to extract, transform and load (ETL) leveraging customer’s serverless data lake framework solutions.
Post which, we originated the idea to leverage metadata-driven data integration framework and developed the outline to ingest data from any structured data sources into any destination by adding metadata information into a metadata file/table. This framework can ingest data from any structured data source systems (RDBMS like Oracle, Local File, FTP server pulls etc.) and store data to any destination (AWS S3, Azure ADLS, RDS etc.).
This accelerator supports schema evolution. Any change in schema of any existing feed doesn’t have any impact on the solution framework, thus reducing the need for any code change. This will save build and testing time and lot of effort by reducing the need for impact analysis of any schema changes.