Data preparation

[2] The issues to be dealt with fall into two main categories: The first step is to set out a full and detailed specification of the format of each data field and what the entries mean.

Where possible and economic, data should be verified against an authoritative source (e.g. business information is referenced against a D&B database to ensure accuracy).

[citation needed] Several companies, such as Paxata, Trifacta, Alteryx, Talend, and Ataccama provide visual interfaces that display the data and allow the user to directly explore, structure, clean, augment, and update sample data provided by the user.

Once the preparation work is complete, the underlying steps can be run on other datasets to perform the same operations.

This reuse provides a significant productivity boost when compared to more traditional manual and hand-coding methods for data preparation.