Data transformation (computing)

Data transformation is typically performed via a mixture of manual and automated steps.

These steps are often the focus of developers or technical data analysts who may use multiple specialized tools to perform their tasks.

[4] Typically, the data transformation technologies generate this code[5] based on the definitions or metadata defined by the developers.

It is typically the business user or final end-user of the data that performs this step.

[1] When data must be transformed and delivered with low latency, the term "microbatch" is often used.

[8] This process leaves the bulk of the work of defining the required transformations to the developer, which often in turn do not have the same domain knowledge as the business user.

This has the potential of introducing errors into the process (through misinterpreted requirements), and also increases the time to arrive at a solution.

They are aiming to efficiently analyze, map and transform large volumes of data without the technical knowledge and process complexity that currently exists.

While these companies use traditional batch transformation, their tools enable more interactivity for users through visual platforms and easily repeated scripts.

[14] There are a number of companies that provide interactive data transformation tools, including Trifacta, Alteryx and Paxata.

They are aiming to efficiently analyze, map and transform large volumes of data while at the same time abstracting away some of the technical complexity and processes which take place under the hood.

In many cases, the grammar is structured using something closely resembling Backus–Naur form (BNF).

The development of domain-specific languages has been linked to increased productivity and accessibility for non-technical users.

A text editor like vim, emacs or TextPad supports the use of regular expressions with arguments.