[2] Google began using TPUs internally in 2015, and in 2018 made them available for third-party use, both as part of its cloud infrastructure and by offering a smaller version of the chip for sale.
[4] The TPU ASICs are mounted in a heatsink assembly, which can fit in a hard drive slot within a data center rack, according to Norman Jouppi.
TPUs are well suited for CNNs, while GPUs have benefits for some fully-connected neural networks, and CPUs can have advantages for RNNs.
"[7] The tensor processing unit was announced in May 2016 at Google I/O, when the company said that the TPU had already been used inside their data centers for over a year.
[5][4] Google's 2017 paper describing its creation cites previous systolic matrix multipliers of similar architecture built in the 1990s.
[8] The chip has been specifically designed for Google's TensorFlow framework, a symbolic math library which is used for machine learning applications such as neural networks.
Some models are commercially available, and on February 12, 2018, The New York Times reported that Google "would allow other companies to buy access to those chips through its cloud-computing service.
[14][15] 393 (int8) 918 (int8) 1836 (int8) The first-generation TPU is an 8-bit matrix multiplication engine, driven with CISC instructions by the host processor across a PCIe 3.0 bus.
[18] Instructions transfer data to or from the host, perform matrix multiplications or convolutions, and apply activation functions.
[28] Google announced that processors themselves are twice as powerful as the second-generation TPUs, and would be deployed in pods with four times as many chips as the preceding generation.
[34] In 2021, Google revealed the physical layout of TPU v5 is being designed with the assistance of a novel application of deep reinforcement learning.
The Edge TPU is Google's purpose-built ASIC chip designed to run machine learning (ML) models for edge computing, meaning it is much smaller and consumes far less power compared to the TPUs hosted in Google datacenters (also known as Cloud TPUs[43]).
Google describe it as "customized to meet the requirements of key camera features in Pixel 4", using a neural network search that sacrifices some accuracy in favor of minimizing latency and power use.
[56] In 2019, Singular Computing, founded in 2009 by Joseph Bates, a visiting professor at MIT,[57] filed suit against Google alleging patent infringement in TPU chips.
In a 2023 court filing, Singular Computing specifically called out Google's use of bfloat16, as that exceeds the dynamic range of float16.