History of artificial neural networks

Artificial neural networks (ANNs) are models created using machine learning to perform a number of tasks.

[3] It greatly outperformed other image recognition models, and is thought to have launched the ongoing AI spring, and further increasing interest in deep learning.

[citation needed] The simplest feedforward network consists of a single weight layer without activation functions.

Linear regression by least squares method was used by Adrien-Marie Legendre (1805) and Carl Friedrich Gauss (1795) for the prediction of planetary movement.

With mathematical notation, Rosenblatt described circuitry not in the basic perceptron, such as the exclusive-or circuit that could not be processed by neural networks at the time.

[16]: section 16  Some consider that the 1962 book developed and explored all of the basic ingredients of the deep learning systems of today.

[21] The first deep learning multilayer perceptron trained by stochastic gradient descent[22] was published in 1967 by Shun'ichi Amari.

[24] Subsequent developments in hardware and hyperparameter tunings have made end-to-end stochastic gradient descent the currently dominant training technique.

Backpropagation is an efficient application of the chain rule derived by Gottfried Wilhelm Leibniz in 1673[25] to networks of differentiable nodes.

The terminology "back-propagating errors" was actually introduced in 1962 by Rosenblatt,[16] but he did not know how to implement this, although Henry J. Kelley had a continuous precursor of backpropagation in 1960 in the context of control theory.

[38] In 1933, Lorente de Nó discovered "recurrent, reciprocal connections" by Golgi's method, and proposed that excitatory loops explain certain aspects of the vestibulo-ocular reflex.

In 1993, a neural history compressor system solved a "Very Deep Learning" task that required more than 1000 subsequent layers in an RNN unfolded in time.

[42] Sepp Hochreiter's diploma thesis (1991)[43] proposed the neural history compressor, and identified and analyzed the vanishing gradient problem.

[43][44] In 1993, a neural history compressor system solved a "Very Deep Learning" task that required more than 1000 subsequent layers in an RNN unfolded in time.

Long short-term memory (LSTM) networks were invented by Hochreiter and Schmidhuber in 1995 and set accuracy records in multiple applications domains.

[50][51] LSTM also improved large-vocabulary speech recognition[52][53] and text-to-speech synthesis[54] and was used in Google voice search, and dictation on Android devices.

[65] The time delay neural network (TDNN) was introduced in 1987 by Alex Waibel and was one of the first CNNs, as it achieved shift invariance.

[71] In 1989, Yann LeCun et al. trained a CNN with the purpose of recognizing handwritten ZIP codes on mail.

Subsequently, Wei Zhang, et al. modified their model by removing the last fully connected layer and applied it for medical image object segmentation in 1991[74] and breast cancer detection in mammograms in 1994.

The ability to process higher-resolution images requires larger and more layers of CNNs, so this technique is constrained by the availability of computing resources.

[87] In 2011, a CNN named DanNet[88][89] by Dan Ciresan, Ueli Meier, Jonathan Masci, Luca Maria Gambardella, and Jürgen Schmidhuber achieved for the first time superhuman performance in a visual pattern recognition contest, outperforming traditional methods by a factor of 3.

For example, in 2011, Xavier Glorot, Antoine Bordes and Yoshua Bengio found that the ReLU[63] worked better than widely used activation functions prior to 2011.

In October 2012, AlexNet by Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton[93] won the large-scale ImageNet competition by a significant margin over shallow machine learning methods.

It was also noticed that saccade control is modulated by cognitive processes, in that the eye moves preferentially towards areas of high salience.

[132][133] A seq2seq architecture employs two RNN, typically LSTM, an "encoder" and a "decoder", for sequence transduction, such as machine translation.

One problem with seq2seq models was their use of recurrent neural networks, which are not parallelizable as both the encoder and the decoder processes the sequence token-by-token.

Seq2seq models with attention still suffered from the same issue with recurrent networks, which is that they are hard to parallelize, which prevented them to be accelerated on GPUs.

SOMs create internal representations reminiscent of the cortical homunculus, a distorted representation of the human body, based on a neurological "map" of the areas and proportions of the human brain dedicated to processing sensory functions, for different parts of the body.

This RBM is a generative stochastic feedforward neural network that can learn a probability distribution over its set of inputs.

[152][153] In 2012, Andrew Ng and Jeff Dean created an FNN that learned to recognize higher-level concepts, such as cats, only from watching unlabeled images taken from YouTube videos.