[5] Essentially, a GPGPU pipeline is a kind of parallel processing between one or more GPUs and CPUs that analyzes data as if it were in image or other graphic form.
Migrating data into graphical form and then using the GPU to scan and analyze it can create a large speedup.
In 1987, Conway's Game of Life became one of the first examples of general-purpose computing using an early stream processor called a blitter to invoke a special sequence of logical operations on bit vectors.
[6] General-purpose computing on GPUs became more practical and popular after about 2001, with the advent of both programmable shaders and floating point support on graphics processors.
A significant milestone for GPGPU was the year 2003 when two research groups independently discovered GPU-based approaches for the solution of general linear algebra problems on GPUs that ran faster than on CPUs.
This cumbersome translation was obviated by the advent of general-purpose programming languages and APIs such as Sh/RapidMind, Brook and Accelerator.
[12] This means that modern GPGPU pipelines can leverage the speed of a GPU without requiring full and explicit conversion of the data to a graphical form.
[citation needed] Any language that allows the code running on the CPU to poll a GPU shader for return values, can create a GPGPU framework.
Programming standards for parallel computing include OpenCL (vendor-independent), OpenACC, OpenMP and OpenHMPP.
[citation needed] OpenCL provides a cross-platform GPGPU platform that additionally supports data parallel compute on CPUs.
The Khronos Group has also standardised and implemented SYCL, a higher-level programming model for OpenCL as a single-source domain specific embedded language based on pure C++11.
Common formats are: For early fixed-function or limited programmability graphics (i.e., up to and including DirectX 8.1-compliant GPUs) this was sufficient because this is also the representation used in displays.
Many GPGPU applications require floating point accuracy, which came with video cards conforming to the DirectX 9 specification.
Efforts have occurred to emulate double-precision floating point values on GPUs; however, the speed tradeoff negates any benefit to offloading the computing onto the GPU in the first place.
[citation needed] Examples include vertices, colors, normal vectors, and texture coordinates.
As time progressed, however, it became valuable for GPUs to store at first simple, then complex structures of data to be passed back to the CPU that analyzed an image, or a set of scientific-data represented as a 2D or 3D format that a video card can understand.
The distinguishing feature of a GPGPU design is the ability to transfer information bidirectionally back from the GPU to the CPU; generally the data throughput in both directions is ideally high, resulting in a multiplier effect on the speed of a specific high-use algorithm.
A more advanced example might use edge detection to return both numerical information and a processed image representing outlines to a computer vision program controlling, say, a mobile robot.
Because the GPU has fast and local hardware access to every pixel or other picture element in an image, it can analyze and average it (for the first example) or apply a Sobel edge filter or other convolution filter (for the second) with much greater speed than a CPU, which typically must access slower random-access memory copies of the graphic in question.
Specialized equipment designs may, however, even further enhance the efficiency of GPGPU pipelines, which traditionally perform relatively few algorithms on very large amounts of data.
Historically, CPUs have used hardware-managed caches, but the earlier GPUs only provided software-managed local memories.
Due to their design, GPUs are only effective for problems that can be solved using stream processing and the hardware can only be used in certain ways.
The following discussion referring to vertices, fragments and textures concerns mainly the legacy model of GPGPU programming, where graphics APIs (OpenGL or DirectX) were used to perform general-purpose computation.
It is important for GPGPU applications to have high arithmetic intensity else the memory access latency will limit computational speedup.
There are a variety of computational resources available on the GPU: In fact, a program can substitute a write only texture for output instead of the framebuffer.
The most common form for a stream to take in GPGPU is a 2D grid because this fits naturally with the rendering model built into GPUs.
Many computations naturally map into grids: matrix algebra, image processing, physically based simulation, and so on.
In sequential code it is possible to control the flow of the program using if-then-else statements and various forms of loops.
While at first glance the operation may seem inherently serial, efficient parallel scan algorithms are possible and have been implemented on graphics processing units.
A variety of data structures can be represented on the GPU: The following are some of the areas where GPUs have been used for general purpose computing: GPGPU usage in Bioinformatics:[62][86] † Expected speedups are highly dependent on system configuration.