Kepler (microarchitecture)

The architecture is named after Johannes Kepler, a German mathematician and key figure in the 17th century scientific revolution.

[2][3] The efficiency aim was achieved through the use of a unified GPU clock, simplified static scheduling of instruction and higher emphasis on performance per watt.

[4] By abandoning the shader clock found in their previous GPU designs, efficiency is increased, even though it requires additional cores to achieve higher levels of performance.

[5] Programmability aim was achieved with Kepler's Hyper-Q, Dynamic Parallelism and multiple new Compute Capabilities 3.x functionality.

[9] On the HPC models, the GK110/210, the SMX count was raised to 13-15 depending on the product, and more FP64 cores were included to bring the compute ratio up to 1/3rd FP32.

Texture cache, which programmers had already been using for compute as a read-only buffer in previous generations, was increased in size and the data path optimized for faster throughput when using this method.

[10] Additional die space reduction and power saving was achieved by removing a complex hardware block that handled the prevention of data hazards.

This clock speed is set to the level which will ensure that the GPU stays within TDP specifications, even at maximum loads.

[5] By taking this approach, the GPU will ramp its clock up or down dynamically, so that it is providing the maximum amount of speed possible while remaining within TDP specifications.

The power target, as well as the size of the clock increase steps that the GPU will take, are both adjustable via third-party utilities and provide a means of overclocking Kepler-based cards.

[17] Exclusive to Kepler GPUs, TXAA is a new anti-aliasing method from Nvidia that is designed for direct implementation into game engines.

The simple nature of Hyper-Q is further reinforced by the fact that it's easily mapped to MPI, a common message passing interface frequently used in HPC.

By increasing the number of MPI jobs, it's possible to utilize Hyper-Q on these algorithms to improve the efficiency all without changing the code itself.

Note that like the previous generation Fermi, Kepler is not able to benefit from increased processing power by dual-issuing MAD+MUL like Tesla was capable of.