Data vault modeling

[3] The modeling method is designed to be resilient to change in the business environment where the data being stored is coming from, by explicitly separating structural information from descriptive attributes.

[4] Data vault is designed to enable parallel loading as much as possible,[5] so that very large implementations can scale out without the need for major redesign.

[citation needed] In its early days, Dan Linstedt referred to the modeling technique which was to become data vault as common foundational warehouse architecture[8] or common foundational modeling architecture.

[10] Both techniques have issues when dealing with changes in the systems feeding the data warehouse[citation needed].

Dan Linstedt, the creator of the method, describes the resulting database as follows: "The Data Vault Model is a detail oriented, historical tracking and uniquely linked set of normalized tables that support one or more functional areas of business.

Due to Sarbanes-Oxley requirements in the USA and similar measures in Europe this is a relevant topic for many business intelligence implementations, hence the focus of any data vault implementation is complete traceability and auditability of all information.

[13] It is necessary to evolve the specification to include the new components, along with the best practices in order to keep the EDW and BI systems current with the needs and desires of today's businesses.

[18] An alternative (and seldom used) name for the method is "Common Foundational Integration Modelling Architecture.

According to Dan Linstedt, the Data Model is inspired by (or patterned off) a simplistic view of neurons, dendrites, and synapses – where neurons are associated with Hubs and Hub Satellites, Links are dendrites (vectors of information), and other Links are synapses (vectors in the opposite direction).

By using a data mining set of algorithms, links can be scored with confidence and strength ratings.

They can be created and dropped on the fly in accordance with learning about relationships that currently don't exist.

The business keys and their associations are structural attributes, forming the skeleton of the data model.

This means that choosing the correct keys for the hubs is of prime importance for the stability of your model.

Hubs contain a list of unique business keys with low propensity to change.

For instance, if you have an association between customer and address, you could add a reference to a link between the hubs for product and transport company.

The descriptive attributes for the information on the association (such as the time, price or amount) are stored in structures called satellite tables which are discussed below.

[26] This is an example for a satellite on the drivers-link between the hubs for cars and persons, called "Driver insurance" (S_DRIVER_INSURANCE).

This satellite contains attributes that are specific to the insurance of the relationship between the car and the person driving it, for instance an indicator whether this is the primary driver, the name of the insurance company for this car and person (could also be a separate hub) and a summary of the number of accidents involving this combination of vehicle and driver.

(**) sequence number becomes mandatory if it is needed to enforce uniqueness for multiple valid satellites on the same hub or link.

They are there to prevent redundant storage of simple reference data that is referenced a lot.

More formally, Dan Linstedt defines reference data as follows: Any information deemed necessary to resolve descriptions from codes, or to translate keys in to (sic) a consistent manner.

At the same time, you can also create all satellites that are attached to hubs, since you can resolve the key to a surrogate ID.

Due to the equivalence of this situation with a link to multiple hubs, this difficulty can be avoided by remodeling such cases and this is in fact the recommended practice.

It is not optimised for query performance, nor is it easy to query by the well-known query-tools such as Cognos, Oracle Business Intelligence Suite Enterprise Edition, SAP Business Objects, Pentaho et al.[citation needed] Since these end-user computing tools expect or prefer their data to be contained in a dimensional model, a conversion is usually necessary.

It includes multiple components of CMMI Level 5, and combines them with best practices from Six Sigma, total quality management (TQM), and SDLC.

Teams using the data vault methodology should readily adapt to the repeatable, consistent, and measurable projects that are expected at CMMI Level 5.