Blue Gene was an IBM project aimed at designing supercomputers that can reach operating speeds in the petaFLOPS (PFLOPS) range, with relatively low power consumption.
[4] After Blue Gene/Q, IBM focused its supercomputer efforts on the OpenPower platform, using accelerators such as FPGAs and GPUs to address the diminishing returns of Moore's law.
[7] In December 1999, IBM announced a US$100 million research initiative for a five-year effort to build a massively parallel computer, to be applied to the study of biomolecular phenomena such as protein folding.
[9] The project had two main goals: to advance understanding of the mechanisms behind protein folding via large-scale simulation, and to explore novel ideas in massively parallel machine architecture and software.
In parallel, Alan Gara had started working on an extension of the QCDOC architecture into a more general-purpose supercomputer.
In November 2004 a 16-rack system, with each rack holding 1,024 compute nodes, achieved first place in the TOP500 list, with a LINPACK benchmarks performance of 70.72 TFLOPS.
Blue Gene/L was the first supercomputer ever to run over 100 TFLOPS sustained on a real-world application, namely a three-dimensional molecular dynamics code (ddcMD), simulating solidification (nucleation and growth processes) of molten metal under high pressure and temperature conditions.
In June 2006, NNSA and IBM announced that Blue Gene/L achieved 207.3 TFLOPS on a quantum chemical application (Qbox).
[13] In 2007, a team from the IBM Almaden Research Center and the University of Nevada ran an artificial neural network almost half as complex as the brain of a mouse for the equivalent of a second (the network was run at 1/10 of normal speed for 10 seconds).
The dual FPUs gave each Blue Gene/L node a theoretical peak performance of 5.6 GFLOPS (gigaFLOPS).
[18] By the integration of all essential sub-systems on a single chip, and the use of low-power logic, each Compute or I/O node dissipated about 17 watts (including DRAMs).
The system was able to electrically isolate faulty components, down to a granularity of half a rack (512 compute nodes), to allow the machine to continue to run.
The I/O nodes, which run the Linux operating system, provided communication to storage and external hosts via an Ethernet network.
A separate and private Ethernet management network provided access to any node for configuration, booting and diagnostics.
To allow multiple programs to run concurrently, a Blue Gene/L system could be partitioned into electronically isolated sets of nodes.
Blue Gene/L compute nodes used a minimal operating system supporting a single user program.
Per November 2009, the TOP500 list contained 15 Blue Gene/P installations of 2-racks (2048 nodes, 8192 processor cores, 23.86 TFLOPS Linpack) and larger.
The A2 processor core is 4-way simultaneously multithreaded and was augmented with a SIMD quad-vector double-precision floating-point unit (IBM QPX).
The same 4-rack system achieved the top position in the Graph500 list[3] with over 250 GTEPS (giga traversed edges per second).
Per June 2012, the TOP500 list contained 20 Blue Gene/Q installations of 1/2-rack (512 nodes, 8192 processor cores, 86.35 TFLOPS Linpack) and larger.
A fully compressible flow solver has also achieved 14.4 PFLOP/s (originally 11 PFLOP/s) on Sequoia, 72% of the machine's nominal peak performance.