Data transformation is the process of converting data from one format or structure to another, such as raw data to processed, or unstructured to structured. Data transformation processes can be automated, manual, or a combination of the two. Typically, data transformation is done as a batch process, where data scientists will develop rules and code to identify what data needs to be changed. Data transformation is often used as part of data integration and data warehousing.
Data integration is the process of taking information from different sources and presenting a unified view of them, often in data warehouses. As part of data warehousing, an ETL (extract, transform, load) approach is often used. ETL tools will duplicate data from one or more data sources by extracting the data, then converting data as part of the transform step to ensure it meets the requirements of the target source (for example, converting data types from one file format to another, or translating coded values), and finally loading it in the end target, the data warehouse.
Data transformation is also used as part of data management. Data scientists may need to convert file types for specific analyses, or as part of long-term storage plans. Rather than manually converting file types, which is time-consuming and prone to errors, data transformation tools can automate the process and help ensure that data quality is not lost. Data transformation processes can access large data sets and quickly transform them as necessary for the task at hand. This type of process can also run data discovery to find just the files or formats that are necessary for the transformation, then it only converts data matching the requirements and leaves the rest alone.
It is also possible to utilize interactive data transformations, which allow for organizations to directly interact with data sets through a visual representation. This can help the organizations better understand the characteristics of the data, and allows for changes or corrections to the data transformations as needed.
By transforming data, organizations can: