Mathematics of artificial neural networks

An artificial neural network (ANN) combines biological principles with advanced statistics to solve problems in domains such as pattern recognition and game-play.

ANNs adopt the basic model of neuron analogues connected to each other in a variety of ways.

The propagation function computes the input

and typically has the form[1] A bias term can be added, changing the form to the following:[2] Neural network models can be viewed as defining a function that takes an input (observation) and produces an output (decision)

Sometimes models are intimately associated with a particular learning rule.

A common use of the phrase "ANN model" is really the definition of a class of such functions (where members of the class are obtained by varying parameters, connection weights, or specifics of the architecture such as the number of neurons, number of layers or their connectivity).

Mathematically, a neuron's network function

This can be conveniently represented as a network structure, with arrows depicting the dependencies between functions.

A widely used type of composition is the nonlinear weighted sum, where

The important characteristic of the activation function is that it provides a smooth transition as input values change, i.e. a small change in input produces a small change in output.

This view is most commonly encountered in the context of optimization.

This view is most commonly encountered in the context of graphical models.

This naturally enables a degree of parallelism in the implementation.

Networks such as the previous one are commonly called feedforward, because their graph is a directed acyclic graph.

Networks with cycles are commonly called recurrent.

Such networks are commonly depicted in the manner shown at the top of the figure, where

However, an implied temporal dependence is not shown.

Backpropagation training algorithms fall into three categories: Let

These are called inputs, outputs and weights, respectively.

In supervised learning, a sequence of training examples

and applying gradient descent to the function

to find a local minimum, starting at

the minimizing weight found by gradient descent.

To implement the algorithm above, explicit formulas are required for the gradient of the function

The learning algorithm can be divided into two phases: propagation and weight update.

Propagation involves the following steps: For each weight: The learning rate is the ratio (percentage) that influences the speed and quality of learning.

The greater the ratio, the faster the neuron trains, but the lower the ratio, the more accurate the training.

Therefore, the weight must be updated in the opposite direction, "descending" the gradient.

Learning is repeated (on new batches) until the network performs adequately.

Pseudocode for a stochastic gradient descent algorithm for training a three-layer network (one hidden layer): The lines labeled "backward pass" can be implemented using the backpropagation algorithm, which calculates the gradient of the error of the network regarding the network's modifiable weights.