ADALINE

[2][3][1][4][5] It was developed by professor Bernard Widrow and his doctoral student Marcian Hoff at Stanford University in 1960.

Adaline unit weights are adjusted to match a teacher signal, before applying the Heaviside function (see figure), but the standard perceptron unit weights are adjusted to match the correct output, after applying the Heaviside function.

, then the output further reduces to: The learning rule used by ADALINE is the LMS ("least mean squares") algorithm, a special case of gradient descent.

, the square of the error,[6] and is in fact the stochastic gradient descent update for linear regression.

[7] MADALINE (Many ADALINE[8]) is a three-layer (input, hidden, output), fully connected, feedforward neural network architecture for classification that uses ADALINE units in its hidden and output layers.

Despite many attempts, they never succeeded in training more than a single layer of weights in a MADALINE model.

Another is a "job assigner": suppose the desired output is -1, and different from the majority-voted output, then the job assigner calculates the minimal number of ADALINE units that must change their outputs from positive to negative, and picks those ADALINE units that are closest to being negative, and makes them update their weights according to the ADALINE learning rule.

[12][13] Some MADALINE machines were demonstrated to perform tasks including inverted pendulum balancing, weather forecasting, and speech recognition.

[8] The Rule II training algorithm is based on a principle called "minimal disturbance".

It proceeds by looping over training examples, and for each example, it: MADALINE Rule 3 - The third "Rule" applied to a modified network with sigmoid activations instead of sign; it was later found to be equivalent to backpropagation.