ImageNet

[7] In 2007, Li met with Princeton professor Christiane Fellbaum, one of the creators of WordNet, to discuss the project.

As a result of this meeting, Li went on to build ImageNet starting from the roughly 22,000 nouns of WordNet and using many of its features.

[10] As an assistant professor at Princeton, Li assembled a team of researchers to work on the ImageNet project.

[11] They presented their database for the first time as a poster at the 2009 Conference on Computer Vision and Pattern Recognition (CVPR) in Florida, titled "ImageNet: A Preview of a Large-scale Hierarchical Dataset".

[6][8] On 30 September 2012, a convolutional neural network (CNN) called AlexNet[16] achieved a top-5 error of 15.3% in the ImageNet 2012 Challenge, more than 10.8 percentage points lower than that of the runner up.

Using convolutional neural networks was feasible due to the use of graphics processing units (GPUs) during training,[16] an essential ingredient of the deep learning revolution.

According to The Economist, "Suddenly people started to pay attention, not just within the AI community but across the technology industry as a whole.

ImageNet uses a variant of the broad WordNet schema to categorize objects, augmented with 120 categories of dog breeds to showcase fine-grained classification.

In machine learning, these are typically preprocessed into a standard constant resolution, and whitened, before further processing by neural networks.

[29] The ILSVRC aims to "follow in the footsteps" of the smaller-scale PASCAL VOC challenge, established in 2005, which contained only about 20,000 images and twenty object classes.

[8] The resulting annual competition is now known as the ImageNet Large Scale Visual Recognition Challenge (ILSVRC).

The ILSVRC uses a "trimmed" list of only 1000 image categories or "classes", including 90 of the 120 dog breeds classified by the full ImageNet schema.

The features are a dense grid of HoG and LBP, sparsified by local coordinate coding and pooling.

In 2012, a deep convolutional neural net called AlexNet achieved 84.7% in top-5 accuracy, a great leap forward.

While the 2012 breakthrough "combined pieces that were all there before", the dramatic quantitative improvement marked the start of an industry-wide artificial intelligence boom.

[39] In 2017 ImageNet stated it would roll out a new, much more difficult challenge in 2018 that involves classifying 3D objects using natural language.

[1] By 2015, researchers at Microsoft reported that their CNNs exceeded human ability at the narrow ILSVRC tasks.

[42] It is also found that around 10% of ImageNet-1k contains ambiguous or erroneous labels, and that, when presented with a model's prediction and the original ImageNet label, human annotators prefer the prediction of a state of the art model in 2020 trained on the original ImageNet, suggesting that ImageNet-1k has been saturated.

[43] A study of the history of the multiple layers (taxonomy, object classes and labeling) of ImageNet and WordNet in 2019 described how bias[clarification needed] is deeply embedded in most classification approaches for all sorts of images.

[48] One downside of WordNet use is the categories may be more "elevated" than would be optimal for ImageNet: "Most people are more interested in Lady Gaga or the iPod Mini than in this rare kind of diplodocus.