Nvidia DGX

These GPUs can be connected either via a version of the SXM socket or a PCIe x16 slot, facilitating flexible integration within the system architecture.

To manage the substantial thermal output, DGX units are equipped with heatsinks and fans designed to maintain optimal operating temperatures.

This framework makes DGX units suitable for computational tasks associated with artificial intelligence and machine learning models.

DGX-1 servers feature 8 GPUs based on the Pascal or Volta daughter cards[1] with 128 GB of total HBM2 memory, connected by an NVLink mesh network.

The product line is intended to bridge the gap between GPUs and AI accelerators using specific features for deep learning workloads.

[7][8] Designed as a turnkey deskside AI supercomputer, the DGX Station is a tower computer that can function completely independently without typical datacenter infrastructure such as cooling, redundant power, or 19 inch racks.

[10] The DGX station is water-cooled to better manage the heat of almost 1500W of total system components, this allows it to keep a noise range under 35 dB under load.

[14] The DGX-2 delivers 2 Petaflops with 512 GB of shared memory for tackling massive datasets and uses NVSwitch for high-bandwidth internal communication.

Also present are eight 100 Gbit/s InfiniBand cards and 30.72 TB of SSD storage,[15] all enclosed within a massive 10U rackmount chassis and drawing up to 10 kW under maximum load.

One more notable addition is the presence of two Nvidia Bluefield 3 DPUs,[29] and the upgrade to 400 Gbit/s InfiniBand via Mellanox ConnectX-7 NICs, double the bandwidth of the DGX A100.

[30] The DGX H100 has two Xeon Platinum 8480C Scalable CPUs (Codenamed Sapphire Rapids)[31] and 2 Terabytes of System Memory.

Selene, built from 280 DGX A100 nodes, ranked 5th on the TOP500 list for most powerful supercomputers at the time of its completion in June 2020,[36] and has continued to remain high in performance[citation needed].

This same integration is available to any customer with minimal effort on their behalf, and the new Hopper-based SuperPod can scale to 32 DGX H100 nodes, for a total of 256 H100 GPUs and 64 x86 CPUs.

DGX H100 system
DGX H100 Top view, showing the GPU Tray