Data quality

The mainframes used business rules to correct common misspellings and typographical errors in name and address data, as well as to track customers who had moved, died, gone to prison, married, divorced, or experienced other life-changing events.

This technology saved large companies millions of dollars in comparison to manual correction of customer data.

Large companies saved on postage, as bills and direct marketing materials made their way to the intended customer more accurately.

Initially sold as a service, data quality moved inside the walls of corporations, as low-cost and powerful server technology became available.

Another framework is based in semiotics to evaluate the quality of the form, meaning and use of the data (Price and Shanks, 2004).

One highly theoretical approach analyzes the ontological nature of information systems to define data quality rigorously (Wand and Wang, 1996).

MIT has an Information Quality (MITIQ) Program, led by Professor Richard Wang, which produces a large number of publications and hosts a significant international conference in this field (International Conference on Information Quality, ICIQ).

One industry study estimated the total cost to the U.S. economy of data quality problems at over U.S. $600 billion per annum (Eckerson, 2002).

[13] One reason contact data becomes stale very quickly in the average database – more than 45 million Americans change their address every year.

Specific example: providing invalid measurements from several sensors to the automatic pilot feature on an aircraft could cause it to crash.

Data Quality checks may be defined at attribute level to have full control on its remediation steps.

Data quality checks are redundant if business logic covers the same functionality and fulfills the same purpose as DQ.

Some data quality checks may be translated into business rules after repeated instances of exceptions in the past.

This check may be a simple generic aggregation rule engulfed by large chunk of data or it can be a complicated logic on a group of attributes of a transaction pertaining to the core business of the organization.

Discovery of reasonableness issues may aid for policy and strategy changes by either business or data governance or both.

For instance, DQ check for completeness and precision on not–null columns is redundant for the data sourced from database.

Within Healthcare, wearable technologies or Body Area Networks, generate large volumes of data.

This is also true for the vast majority of mHealth apps, EHRs and other health related software solutions.

[21] The primary reason for this, stems from the extra cost involved is added a higher degree of rigor within the software architecture.

[2] mHealth is an increasingly important strategy for delivery of health services in low- and middle-income countries.

However, these mobile devices are commonly used for personal activities, as well, leaving them more vulnerable to security risks that could lead to data breaches.

[23] Data quality has become a major focus of public health programs in recent years, especially as demand for accountability increases.