80 Million Tiny Images

80 Million Tiny Images is a dataset intended for training machine learning systems constructed by Antonio Torralba, Rob Fergus, and William T. Freeman in a collaboration between MIT and New York University.

The dataset was motivated by non-parametric models of neural activations in the visual cortex upon seeing images.

[1] They began with all 75,846 nonabstract nouns in WordNet, and then for each of these nouns, they scraped 7 Image search engines: Altavista, Ask.com, Flickr, Cydral, Google, Picsearch and Webshots.

Since they didn't have enough storage, they downsized the images to 32×32 as they were scraped.

[1] The 80 Million Tiny Images dataset was retired from use by its creators in 2020,[5] after a paper by researchers Abeba Birhane and Vinay Prabhu found that some of the labeling of several publicly available image datasets, including 80 Million Tiny Images, contained racist and misogynistic slurs which were causing models trained on them to exhibit racial and sexual bias.