Apache Oozie

Apache Oozie is a server-based workflow scheduling system to manage Hadoop jobs.

Workflows in Oozie are defined as a collection of control flow and action nodes in a directed acyclic graph.

Action nodes are the mechanism by which a workflow triggers the execution of a computation/processing task.

Oozie provides support for different types of actions including Hadoop MapReduce, Hadoop distributed file system operations, Pig, SSH, and email.

If properly parameterized (using different output directories), several identical workflow jobs can run concurrently.