Lambda architecture

The rise of lambda architecture is correlated with the growth of big data, real-time analytics, and the drive to mitigate the latencies of map-reduce.

This paradigm was first described by Nathan Marz in a blog post titled "How to beat the CAP theorem" in which he originally termed it the "batch/realtime architecture".

[3]: 93, 287, 293 Metamarkets, which provides analytics for companies in the programmatic advertising space, employs a version of the lambda architecture that uses Druid for storing and serving both the streamed and batch-processed data.

The batch and streaming sides each require a different code base that must be maintained and kept in sync so that processed data produces the same result from both paths.

Yet attempting to abstract the code bases into a single framework puts many of the specialized tools in the batch and real-time ecosystems out of reach.

[13] Jay Kreps introduced the kappa architecture to use a pure streaming approach with a single code base.

[14] Such a streaming framework could allow for collecting and processing arbitrarily large windows of data, accommodate blocking, and handle state.

Flow of data through the processing and serving layers of a generic lambda architecture
Diagram showing the flow of data through the processing and serving layers of lambda architecture. Example named components are shown.
Diagram showing a lambda architecture with a Druid data store.