The core of Apache Flink is a distributed streaming data-flow engine written in Java and Scala.
[3][4] Flink executes arbitrary dataflow programs in a data-parallel and pipelined (hence task parallel) manner.
[5] Flink's pipelined runtime system enables the execution of bulk/batch and stream processing programs.
[8] Flink provides a high-throughput, low-latency streaming engine[9] as well as support for event-time processing and state management.
Apache Flink's dataflow programming model provides event-at-a-time processing on both finite and infinite datasets.
[16] Flink programs run as a distributed system within a cluster and can be deployed in a standalone mode as well as on YARN, Mesos, Docker-based setups along with other resource management frameworks.
[20] Apache Flink includes a lightweight fault tolerance mechanism based on distributed checkpoints.
Flink's DataStream API enables transformations (e.g. filters, aggregations, window functions) on bounded or unbounded streams of data.
A simple example of a stateful stream processing program is an application that emits a word count from a continuous input stream and groups the data in 5-second windows:Apache Beam “provides an advanced unified programming model, allowing (a developer) to implement batch and streaming data processing jobs that can run on any execution engine.”[23] The Apache Flink-on-Beam runner is the most feature-rich according to a capability matrix maintained by the Beam community.
In 2020, following the COVID-19 pandemic, Flink Forward's spring edition which was supposed to be hosted in San Francisco was canceled.
[28] In 2010, the research project "Stratosphere: Information Management on the Cloud"[29] led by Volker Markl (funded by the German Research Foundation (DFG))[30] was started as a collaboration of Technische Universität Berlin, Humboldt-Universität zu Berlin, and Hasso-Plattner-Institut Potsdam.
Flink started from a fork of Stratosphere's distributed execution engine and it became an Apache Incubator project in March 2014.