Particularly, they are inspired by the behaviour of neurons and the electrical signals they convey between input (such as from the eyes or nerve endings in the hand), processing, and output from the brain (such as reacting to light, touch, or heat).
[1][2][3][4] Most artificial neural networks bear only some resemblance to their more complex biological counterparts, but are very effective at their intended tasks (e.g. classification or segmentation).
Neural networks can be hardware- (neurons are represented by physical components) or software-based (computer models), and can use a variety of topologies and learning algorithms.
A time delay neural network (TDNN) is a feedforward architecture for sequential data that recognizes features independent of sequence position.
[32] It formulates the learning as a convex optimization problem with a closed-form solution, emphasizing the mechanism's similarity to stacked generalization.
It offers two important improvements: it uses higher-order information from covariance statistics, and it transforms the non-convex problem of a lower-layer to a convex sub-problem of an upper-layer.
[34] TDSNs use covariance statistics in a bilinear mapping from each of two distinct sets of hidden units in the same layer to predictions, via a third-order tensor.
This ultimately finds neuron activations minimizing mutual input overlap, estimating distributions during recognition and offloading the need for complex neural network training & rehearsal.
Radial basis functions have been applied as a replacement for the sigmoidal hidden layer transfer characteristic in multi-layer perceptrons.
In classification problems the fixed non-linearity introduced by the sigmoid output function is most efficiently dealt with using iteratively re-weighted least squares.
A common solution is to associate each data point with its own centre, although this can expand the linear system to be solved in the final layer and requires shrinkage techniques to avoid overfitting.
All three approaches use a non-linear kernel function to project the input data into a space where the learning problem can be solved using a linear model.
Like Gaussian processes, and unlike SVMs, RBF networks are typically trained in a maximum likelihood framework by maximizing the probability (minimizing the error).
An iterative procedure computes the optimal regularization Lambda parameter that minimizes the generalized cross-validation (GCV) error.
[57] At each time step, the input is propagated in a standard feedforward fashion, and then a backpropagation-like learning rule is applied (not performing gradient descent).
Another important feature of ASNN is the possibility to interpret neural network results by analysis of correlations between data cases in the space of models.
The Cascade-Correlation architecture has several advantages: It learns quickly, determines its own size and topology, retains the structures it has built even if the training set changes and requires no backpropagation.
Depending on the FIS type, several layers simulate the processes involved in a fuzzy inference-like fuzzification, inference, aggregation and defuzzification.
Furthermore, unlike typical artificial neural networks, CPPNs are applied across the entire space of possible inputs so that they can represent a complete image.
These models have been applied in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base and the output is a textual response.
It is done by creating a specific memory structure, which assigns each new pattern to an orthogonal plane using adjacently connected hierarchical arrays.
HTM is a method for discovering and inferring the high-level causes of observed input patterns and sequences, thus building an increasingly complex model of the world.
For example: Neural Turing machines (NTM)[86] couple LSTM networks to external memory resources, with which they can interact by attentional processes.
The combined system is analogous to a Turing machine but is differentiable end-to-end, allowing it to be efficiently trained by gradient descent.
Preliminary results demonstrate that neural Turing machines can infer simple algorithms such as copying, sorting and associative recall from input and output examples.
While training extremely deep (e.g., 1 million layers) neural networks might not be practical, CPU-like architectures such as pointer networks[95] and neural random-access machines[96] overcome this limitation by using external random-access memory and other components that typically belong to a computer architecture such as registers, ALU and pointers.
It uses multiple types of units, (originally two, called simple and complex cells), as a cascading model for use in pattern recognition tasks.
[109] Among the various kinds of neocognitron[110] are systems that can detect multiple patterns in the same input by using back propagation to achieve selective attention.
However, these architectures are poor at learning novel classes with few examples, because all network units are involved in representing the input (a distributed representation) and must be adjusted together (high degree of freedom).
They use kernel principal component analysis (KPCA),[131] as a method for the unsupervised greedy layer-wise pre-training step of deep learning.