In scientific computing, GotoBLAS and GotoBLAS2 are open source implementations of the BLAS (Basic Linear Algebra Subprograms) API with many hand-crafted optimizations for specific processor types.
[1] GotoBLAS remains available, but development ceased with a final version touting optimal performance on Intel's Nehalem architecture (contemporary in 2008).
[2] OpenBLAS is an actively maintained fork of GotoBLAS, developed at the Lab of Parallel Software and Computational Science, ISCAS.
GotoBLAS's matrix-matrix multiplication routine, called GEMM in BLAS terms, is highly tuned for the x86 and AMD64 processor architectures by means of handcrafted assembly code.
[4] As of January 2022, the Texas Advanced Computing Center website[5] states that Goto BLAS in no more maintained and suggests the use of BLIS or MKL.