Early supercomputer architectures pioneered by Seymour Cray relied on compact innovative designs and local parallelism to achieve superior computational peak performance.
[11][12] Since the late 1960s the growth in the power and proliferation of supercomputers has been dramatic, and the underlying architectural directions of these systems have taken significant turns.
[4] Seymour Cray's "get the heat out" motto was central to his design philosophy and has continued to be a key issue in supercomputer architectures, e.g., in large-scale experiments such as Blue Waters.
[9] As the price/performance of general purpose graphic processors (GPGPUs) has improved, a number of petaflop supercomputers such as Tianhe-I and Nebulae have started to rely on them.
In this approach, all processors had access to shared registers that did not move data back and forth but were only used for interprocessor communication and synchronization.
However, inherent challenges in managing a large amount of shared memory among many processors resulted in a move to more distributed architectures.
[27] During the 1980s, as the demand for computing power increased, the trend to a much larger number of processors began, ushering in the age of massively parallel systems, with distributed memory and distributed file systems,[2] given that shared memory architectures could not scale to a large number of processors.
[30] Computer clustering relies on a centralized management approach which makes the nodes available as orchestrated shared servers.
[31][32] When a large number of local semi-independent computing nodes are used (e.g. in a cluster architecture) the speed and flexibility of the interconnect becomes very important.
Modern supercomputers have taken different approaches to address this issue, e.g. Tianhe-1 uses a proprietary high-speed network based on the Infiniband QDR, enhanced with FeiTeng-1000 CPUs.
[12] Massive centralized systems at times use special-purpose processors designed for a specific application, and may use field-programmable gate arrays (FPGA) chips to gain performance by sacrificing generality.
[39] Some BOINC applications have reached multi-petaflop levels by using close to half a million computers connected on the internet, whenever volunteer resources become available.
Although grid computing has had success in parallel task execution, demanding supercomputer applications such as weather simulations or computational fluid dynamics have remained out of reach, partly due to the barriers in reliable sub-assignment of a large number of tasks as well as the reliable availability of resources at a given time.
[44] The quasi-opportunistic approach enables the execution of demanding applications within computer grids by establishing grid-wise resource allocation agreements; and fault tolerant message passing to abstractly shield against the failures of the underlying resources, thus maintaining some opportunism, while allowing a higher level of control.
[53] It has 112 computer cabinets and 262 terabytes of distributed memory; 2 petabytes of disk storage is implemented via Lustre clustered files.
[57] The limits of specific approaches continue to be tested, as boundaries are reached through large-scale experiments, e.g., in 2011 IBM ended its participation in the Blue Waters petaflops project at the University of Illinois.
[6] The goal of a sustained petaflop led to design choices that optimized single-core performance, and hence a lower number of cores.
[6] Blue Waters had been expected to run at sustained speeds of at least one petaflop, and relied on the specific water-cooling approach to manage heat.
IBM released the Power 775 computing node derived from that project's technology soon thereafter, but effectively abandoned the Blue Waters approach.