Catastrophic interference

[4] Specifically, these problems refer to the challenge of making an artificial neural network that is sensitive to, but not disrupted by, new information.

[6] The term catastrophic interference was originally coined by McCloskey and Cohen (1989) but was also brought to the attention of the scientific community by research from Ratcliff (1990).

[2] McCloskey and Cohen (1989) noted the problem of catastrophic interference during two different experiments with backpropagation neural network modelling.

Furthermore, the problems 2+1 and 2+1, which were included in both training sets, even showed dramatic disruption during the first learning trials of the twos facts.

In their second connectionist model, McCloskey and Cohen attempted to replicate the study on retroactive interference in humans by Barnes and Underwood (1959).

When the model was trained concurrently on the A-B and A-C items then the network readily learned all of the associations correctly.

Overall, McCloskey and Cohen (1989) concluded that: Ratcliff (1990) used multiple sets of backpropagation models applied to standard recognition memory procedures, in which the items were sequentially learned.

The main cause of catastrophic interference seems to be overlap in the representations at the hidden layer of distributed neural networks.

Below are a number of techniques which have empirical support in successfully reducing catastrophic interference in backpropagation neural networks: Many of the early techniques in reducing representational overlap involved making either the input vectors or the hidden unit activation patterns orthogonal to one another.

Lewandowsky and Li (1995)[12] noted that the interference between sequentially learned patterns is minimized if the input vectors are orthogonal to each other.

Neural networks that employ very localized representations do not show catastrophic interference because of the lack of overlap at the hidden layer.

Consequently, the novelty rule changes only the weights that were not previously dedicated to storing information, thereby reducing the overlap in representations at the hidden units.

[13] However, a limitation is that this rule can only be used with auto-encoder or auto-associative networks, in which the target response for the output layer is identical to the input pattern.

McRae and Hetherington (1993)[9] argued that humans, unlike most neural networks, do not take on new learning tasks with a random set of weights.

Rather, people tend to bring a wealth of prior knowledge to a task and this helps to avoid the problem of interference.

In the pseudo-recurrent network, one of the sub-networks acts as an early processing area, akin to the hippocampus, and functions to learn new input patterns.

Inspired by[14] and independently of[5] Ans and Rousset (1997)[16] also proposed a two-network artificial neural architecture with memory self-refreshing that overcomes catastrophic interference when sequential learning tasks are carried out in distributed networks trained by backpropagation.

What mainly distinguishes this model from those that use classical pseudorehearsal[14][5] in feedforward multilayer networks is a reverberating process[further explanation needed] that is used for generating pseudopatterns.

After a number of activity re-injections from a single random seed, this process tends to go up to nonlinear network attractors that are more suitable for capturing optimally the deep structure of knowledge distributed within connection weights than the single feedforward pass of activity used in pseudo-rehearsal.

The memory self-refreshing procedure turned out to be very efficient in transfer processes[17] and in serial learning of temporal sequences of patterns without catastrophic forgetting.

[20][21] The insights into the mechanisms of memory consolidation during the sleep processes in human and animal brain led to other biologically inspired approaches.

Kirkpatrick et al. (2017)[29] proposed elastic weight consolidation (EWC), a method to sequentially train a single artificial neural network on multiple tasks.

To estimate the importance of the network weights, EWC uses probabilistic mechanisms, in particular the Fisher information matrix, but this can be done in other ways as well.

[35] Catastrophic Remembering may often occur as an outcome of elimination of catastrophic interference by using a large representative training set or enough sequential memory sets (memory replay or data rehearsal), leading to a breakdown in discrimination between input patterns that have been learned and those that have not.

Figure 2: The architecture of a pseudo-recurrent network