[3] GCN requires considerably more transistors than TeraScale, but offers advantages for general-purpose GPU (GPGPU) computation due to a simpler compiler.
GCN was also used in the graphics portion of Accelerated Processing Units (APUs), including those in the PlayStation 4 and Xbox One.
[7] MIAOW is an open-source RTL implementation of the AMD Southern Islands GPGPU microarchitecture.
In November 2015, AMD announced its Boltzmann Initiative, which aims to enable the porting of CUDA-based applications to a common C++ programming model.
AMD has claimed that each GCN compute unit (CU) has 64 KiB Local Data Share (LDS).
[16] The CU scheduler is the hardware functional block, choosing which wavefronts the SIMD-VU executes.
AMD and Nvidia chose similar approaches to hide this unavoidable latency: the grouping of multiple threads.
A group of threads is the most basic unit of scheduling of GPUs that implement this approach to hide latency.
Note that in conjunction with the SSE instructions, this concept of the most basic level of parallelism is often called a "vector width".
VCE 3.0 formed a part of the third generation of GCN, adding high-quality video scaling and the HEVC (H.265) codec.
In a preview in 2011, AnandTech wrote about the unified virtual memory, supported by Graphics Core Next.
[22] This very first implementation focuses on a single "Kaveri" APU and works alongside the existing Radeon kernel graphics driver (kgd).
[25] A driver update has enabled the hardware schedulers in third generation GCN parts for production use.
A Shader Engine comprises one geometry processor, up to 44 CUs (Hawaii chip), rasterizers, ROPs, and L1 cache.
[32] At AMD Developer Summit (APU) in November 2013 Michael Mantor presented the Radeon R9 290X.
[33] Discrete GPUs (Sea Islands family): integrated into APUs: GCN 3rd generation[34] was introduced in 2014 with the Radeon R9 285 and R9 M295X, which have the "Tonga" GPU.
It features improved tessellation performance, lossless delta color compression to reduce memory bandwidth usage, an updated and more efficient instruction set, a new high quality scaler for video, HEVC encoding (VCE 3.0) and HEVC decoding (UVD 6.0), and a new multimedia engine (video encoder/decoder).
All Polaris-based chips other than the Polaris 30 are produced on the 14 nm FinFET process, developed by Samsung Electronics and licensed to GlobalFoundries.
[39] The slightly newer refreshed Polaris 30 is built on the 12 nm LP FinFET process node, developed by Samsung and GlobalFoundries.
It is an optimization for 14 nm FinFET process enabling higher GPU clock speeds than with the 3rd GCN generation.
[40] Architectural improvements include new hardware schedulers, a new primitive discard accelerator, a new display controller, and an updated UVD that can decode HEVC at 4K resolutions at 60 frames per second with 10 bits per color channel.
AMD began releasing details of their next generation of GCN Architecture, termed the 'Next-Generation Compute Unit', in January 2017.
The discrete graphics chipsets also include "HBCC (High Bandwidth Cache Controller)", but not when integrated into APUs.
[47] Additionally, the new chips were expected to include improvements in the Rasterisation and Render output units.