Algorithmic skeleton

Algorithmic skeletons take advantage of common programming patterns to hide the complexity of parallel and distributed applications.

AdHoc,[4][5] a hierarchical and fault-tolerant Distributed Shared Memory (DSM) system is used to interconnect streams of data between processing elements by providing a repository with: get/put/remove/execute operations.

ASSIST also supports autonomic control of parmods, and can be subject to a performance contract by dynamically adapting the number of resources used.

Then, programmers fill the hooks required for the pattern, and new code is generated as a framework in Java for the parallel execution of the application.

Calcium supports the execution of skeleton applications on top of the ProActive environment for distributed cluster like infrastructure.

[8] Second, a type system for nestable skeletons which is proven to guarantee subject reduction properties and is implemented using Java Generics.

Instead, skeletons are defined on top of Eden's lower-level process abstraction, supporting both task and data parallelism.

[21][22] FastFlow is a skeletal parallel programming framework specifically targeted to the development of streaming and data-parallel applications.

Like other high-level programming frameworks, such as Intel TBB and OpenMP, it simplifies the design and engineering of portable parallel applications.

HOCs are Grid-enabled skeletons, implemented as components on top of the Globus Toolkit, remotely accessibly via Web Services.

The evaluation of a skeleton application follows a formal definition of operational semantics introduced by Aldinucci and Danelutto,[35][36] which can handle both task and data parallelism.

Additionally, several performance optimization are applied such as: skeleton rewriting techniques [18, 10], task lookahead, and server-to-server lazy binding.

When the input stream receives a new parameter, the skeleton program is processed to obtain a macro-data flow graph.

The nodes of the graph are macro-data flow instructions (MDFi) which represent the sequential pieces of code provided by the programmer.

Muskel also provides support for combining structured with unstructured programming[44] and recent research has addressed extensibility.

A custom MPI abstraction layer is used, NetStream, which takes care of primitive data type marshalling, synchronization, etc.

Marrow[49][50] is a C++ algorithmic skeleton framework for the orchestration of OpenCL computations in, possibly heterogeneous, multi-GPU environments.

Moreover, the framework introduces optimizations that overlap communication and computation, hence masking the latency imposed by the PCIe bus.

Other than expressing which kernel parameters may be decomposed and, when required, defining how the partial results should be merged, the programmer is completely abstracted from the underlying multi-GPU architecture.

[61] As a unique feature, Muesli's data parallel skeletons automatically scale both on single- as well as on multi-core, multi-node cluster architectures.

However, this feature is optional in the sense that a program written with Muesli still compiles and runs on a single-core, multi-node cluster computer without changes to the source code, i.e. backward compatibility is guaranteed.

A template implements a skeleton on a specific architecture and provides a parametric process graph with a performance model.

[66] A P3L module corresponds to a properly defined skeleton construct with input and output streams, and other sub-modules or sequential C code.

SkIE[67] (Skeleton-based Integrated Environment) is quite similar to P3L, as it is also based on a coordination language, but provides advanced features such as debugging tools, performance analysis, visualization and graphical user interface.

Instead of directly using the coordination language, programmers interact with a graphical tool, where parallel modules based on skeletons can be composed.

SBASCO (Skeleton-BAsed Scientific COmponents) is a programming environment oriented towards efficient development of parallel and distributed numerical applications.

It is a C++ template library with six data-parallel and one task-parallel skeletons, two container types, and support for execution on multi-GPU systems both with CUDA and OpenCL.

Recently, support for hybrid execution, performance-aware dynamic scheduling and load balancing is developed in SkePU by implementing a backend for the StarPU runtime system.

QUAFF relies on template-based meta-programming techniques to reduce runtime overheads and perform skeleton expansions and optimizations at compilation time.

QUAFF is based on the CSP-model, where the skeleton program is described as a process network and production rules (single, serial, par, join).