Data engineering

Around the 1970s/1980s the term information engineering methodology (IEM) was created to describe database design and the use of software for data analysis and processing.

[3][4] These techniques were intended to be used by database administrators (DBAs) and by systems analysts based upon an understanding of the operational processing needs of organizations for the 1980s.

In particular, these techniques were meant to help bridge the gap between strategic business planning and information systems.

A key early contributor (often called the "father" of information engineering methodology) was the Australian Clive Finkelstein, who wrote several articles about it between 1976 and 1980, and also co-authored an influential Savant Institute report on it with James Martin.

[3][8] Due to the new scale of the data, major firms like Google, Facebook, Amazon, Apple, Microsoft, and Netflix started to move away from traditional ETL and storage techniques.

[15] More recently, NewSQL databases — which attempt to allow horizontal scaling while retaining ACID guarantees — have become popular.

A data lake can be created on premises or in a cloud-based environment using the services from public cloud vendors such as Amazon, Microsoft, or Google.

There are several options: The number and variety of different data processes and storage locations can become overwhelming for users.

This inspired the usage of a workflow management system (e.g. Airflow) to allow the data tasks to be specified, created, and monitored.

[28] They are focused on the production readiness of data and things like formats, resilience, scaling, and security.