Cilk

That company was subsequently acquired by Intel, which increased compatibility with existing C and C++ code, calling the result Cilk Plus.

The original Cilk language was based on ANSI C, with the addition of Cilk-specific keywords to signal parallelism.

The last version, Cilk 5.4.6, is available from the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL), but is no longer supported.

The company closed a Series A venture financing round in October 2007, and its product, Cilk++ 1.0, shipped in December, 2008.

On July 31, 2009, Cilk Arts announced on its web site that its products and engineering team were now part of Intel Corp.

[16] The principle behind the design of the Cilk language is that the programmer should be responsible for exposing the parallelism, identifying elements that can safely be executed in parallel; it should then be left to the run-time environment, particularly the scheduler, to decide during execution how to actually divide the work between processors.

It is because these responsibilities are separated that a Cilk program can run without rewriting on any number of processors, including one.

The original Cilk (Cilk-1) used a rather different syntax that required programming in an explicit continuation-passing style, and the Fibonacci examples looks as follows:[17] Inside fib's recursive case, the spawn_next keyword indicates the creation of a successor thread (as opposed to the child threads created by spawn), which executes the sum subroutine after waiting for the continuation variables x and y to be filled in by the recursive calls.

The optional "grain size" pragma determines the coarsening: any sub-array of one hundred or fewer elements is processed sequentially.

Although the Cilk specification does not specify the exact behavior of the construct, the typical implementation is a divide-and-conquer recursion,[18] as if the programmer had written The reasons for generating a divide-and-conquer program rather than the obvious alternative, a loop that spawn-calls the loop body as a function, lie in both the grainsize handling and in efficiency: doing all the spawning in a single task makes load balancing a bottleneck.

The review also noted the fact that reductions (e.g., sums over arrays) need additional code.

[18] Cilk++ added a kind of objects called hyperobjects, that allow multiple strands to share state without race conditions and without using explicit locks.

[20] The most common type of hyperobject is a reducer, which corresponds to the reduction clause in OpenMP or to the algebraic notion of a monoid.

Burckhardt et al. point out that even the sum reducer can result in non-deterministic behavior, showing a program that may produce either 1 or 2 depending on the scheduling order:[21] Intel Cilk Plus adds notation to express high-level operations on entire arrays or sections of arrays; e.g., an axpy-style function that is ordinarily written can in Cilk Plus be expressed as This notation helps the compiler to effectively vectorize the application.

Intel Cilk Plus allows C/C++ operations to be applied to multiple array elements in parallel, and also provides a set of built-in functions that can be used to perform vectorized shifts, rotates, and reductions.

Similar functionality exists in Fortran 90; Cilk Plus differs in that it never allocates temporary arrays, so memory usage is easier to predict.

This pragma gives the compiler permission to vectorize a loop even in cases where auto-vectorization might fail.

The Cilk scheduler uses a policy called "work-stealing" to divide procedure execution efficiently among multiple processors.

The processor maintains a stack on which it places each frame that it has to suspend in order to handle a procedure call.