Cray XMT

Designed to make use of commodity parts and existing subsystems for other commercial systems, it alleviated the shortcomings of Cray MTA-2's high cost of fully custom manufacture and support.

[5][7] Front-end (login, I/O, and other service nodes, utilizing AMD Opteron processors and running SLES Linux) and back-end (compute nodes, utilizing Threadstorm3 processors and running MTK, a simple BSD Unix-based microkernel[3]) communicate through the LUC (Lightweight User Communication) interface, a RPC-style bidirectional client/server interface.

[5][nb 2] Due to the MTA's pipeline length of 21, each stream is selected to execute instructions again no prior than 21 cycles later.

[11] The TDP of the processor package is 30 W.[12] Due to their thread-level context switch at each cycle, performance of Threadstorm CPUs is not constrained by memory access time.

The architecture excels in data walking schemes where subsequent memory access cannot be easily predicted and thus wouldn't be well suited to a conventional cache model.

[nb 5] Though the longer burst length could be compensated by higher speeds of DDR3, it would also require more power, which Cray engineers wanted to avoid.

Most of Threadstorm3's features would be retained, including the multiplexing of many hardware streams onto an execution pipeline and the implementation of additional state bits for every 64-bit memory word.

Cray XMT2 was bought by several federal laboratories and academic facilities, as well as some commercial HPC clients: e.g. CSCS (2 TB global memory with 64 Threadstorm4 CPUs),[15] Noblis CAHPC.