Oja's rule

It is a modification of the standard Hebb's Rule that, through multiplicative normalization, solves all stability problems and generates an algorithm for principal components analysis.

In component form as a difference equation, it is written or in scalar form with implicit n-dependence, where y(xn) is again the output, this time explicitly dependent on its input vector x. Hebb's rule has synaptic weights approaching infinity with a positive learning rate.

We again make the specification of a linear neuron, that is, the output of the neuron is equal to the sum of the product of each input and its synaptic weight to the power of p-1, which in the case of p=2 is synaptic weight itself, or We also specify that our weights normalize to 1, which will be a necessary condition for stability, so which, when substituted into our expansion, gives Oja's rule, or In analyzing the convergence of a single neuron evolving by Oja's rule, one extracts the first principal component, or feature, of a data set.

Furthermore, with extensions using the Generalized Hebbian Algorithm, one can create a multi-Oja neural network that can extract as many features as desired, allowing for principal components analysis.

A principal component aj is extracted from a dataset x through some associated vector qj, or aj = qj⋅x, and we can restore our original dataset by taking In the case of a single neuron trained by Oja's rule, we find the weight vector converges to q1, or the first principal component, as time or number of iterations approaches infinity.

We can also define, given a set of input vectors Xi, that its correlation matrix Rij = XiXj has an associated eigenvector given by qj with eigenvalue λj.

The variance of outputs of our Oja neuron σ2(n) = ⟨y2(n)⟩ then converges with time iterations to the principal eigenvalue, or These results are derived using Lyapunov function analysis, and they show that Oja's neuron necessarily converges on strictly the first principal component if certain conditions are met in our original learning rule.

Most importantly, our learning rate η is allowed to vary with time, but only such that its sum is divergent but its power sum is convergent, that is Our output activation function y(x(n)) is also allowed to be nonlinear and nonstatic, but it must be continuously differentiable in both x and w and have derivatives bounded in time.

[2] PCA has also had a long history of use before Oja's rule formalized its use in network computation in 1989.

The model can thus be applied to any problem of self-organizing mapping, in particular those in which feature extraction is of primary interest.

It is also useful as it expands easily to higher dimensions of processing, thus being able to integrate multiple outputs quickly.

Such a derivation requires retrograde signalling from the postsynaptic neuron, which is biologically plausible (see neural backpropagation), and takes the form of where as before wij is the synaptic weight between the ith input and jth output neurons, x is the input, y is the postsynaptic output, and we define ε to be a constant analogous the learning rate, and cpre and cpost are presynaptic and postsynaptic functions that model the weakening of signals over time.