Parallel Thread Execution

Parallel Thread Execution (PTX or NVPTX[1]) is a low-level parallel thread execution virtual machine and instruction set architecture used in Nvidia's Compute Unified Device Architecture (CUDA) programming environment.

The Nvidia CUDA Compiler (NVCC) translates code written in CUDA, a C++-like language, into PTX instructions (an IL), and the graphics driver contains a compiler which translates PTX instructions into executable binary code,[2] which can run on the processing cores of Nvidia graphics processing units (GPUs).

The GNU Compiler Collection also has basic ability to generate PTX in the context of OpenMP offloading.

Programs start with declarations of the form It is a three-argument assembly language, and almost all instructions explicitly list the data type (in sign and width) on which they operate.

[5] Load (ld) and store (st) commands refer to one of several distinct state spaces (memory banks), e.g. ld.param.