Data profiling

[4][5] The result of the analysis is used to determine the suitability of the candidate source systems, usually giving the basis for an early go/no-go decision, and also to identify problems for later solution design.

[3] Data profiling utilizes methods of descriptive statistics such as minimum, maximum, mean, mode, percentile, standard deviation, frequency, variation, aggregates such as count and sum, and additional metadata information obtained during data profiling such as data type, length, discrete values, uniqueness, occurrence of null values, typical string patterns, and abstract type recognition.

Finally, overlapping value sets possibly representing foreign key relationships between entities can be explored in an inter-table analysis.

A light profiling assessment should be undertaken immediately after candidate source systems have been identified and DW/BI business requirements have been satisfied.

The purpose of this initial analysis is to clarify at an early stage if the correct data is available at the appropriate detail level and that anomalies can be handled subsequently.