Sparse distributed memory

[1] This memory exhibits behaviors, both in theory and in experiment, that resemble those previously unapproached by machines – e.g., rapid recognition of faces or odors, discovery of new connections between seemingly unrelated ideas, etc.

It arose from the observation that the distances between points of a high-dimensional space resemble the proximity relations between concepts in human memory.

[9][10] An important property of such high dimensional spaces is that two randomly chosen vectors are relatively far away from each other, meaning that they are uncorrelated.

Kanerva's proposal is based on four basic ideas:[12] The SDM works with n-dimensional vectors with binary components.

Depending on the context, the vectors are called points, patterns, addresses, words, memory items, data, or events.

The SDM may be regarded either as a content-addressable extension of a classical random-access memory (RAM) or as a special type of three layer feedforward neural network.

This mechanism is complementary to adjustable synapses or adjustable weights in a neural network (perceptron convergence learning), as this fixed accessing mechanism would be a permanent frame of reference which allows to select the synapses in which the information is stored and from which it is retrieved under given set of circumstances.

SDM assumes that the address patterns actually describing physical situations of interest are sparsely scattered throughout the input space.

Unlike conventional Turing machines SDM is taking advantage of parallel computing by the address decoders.

The corresponding critical distance of a Sparse Distributed Memory can be approximately evaluated minimizing the following equation with the restriction

Where: An associative memory system using sparse, distributed representations can be reinterpreted as an importance sampler, a Monte Carlo method of approximating Bayesian inference.

The SDM will produce acceptable responses from a training set when this approximation is valid, that is, when the training set contains sufficient data to provide good estimates of the underlying joint probabilities and there are enough Monte Carlo samples to obtain an accurate estimate of the integral.

Theoretical work on SDM by Kanerva has suggested that sparse coding increases the capacity of associative memory by reducing overlap between representations.

Experimentally, sparse representations of sensory information have been observed in many systems, including vision,[21] audition,[22] touch,[23] and olfaction.

Some progress has been made in 2014 by Gero Miesenböck's lab at the University of Oxford analyzing Drosophila Olfactory system.

[25] In Drosophila, sparse odor coding by the Kenyon cells of the mushroom body is thought to generate a large number of precisely addressable locations for the storage of odor-specific memories.

Lin et al.[26] demonstrated that sparseness is controlled by a negative feedback circuit between Kenyon cells and the GABAergic anterior paired lateral (APL) neuron.

These results suggest that feedback inhibition suppresses Kenyon cell activity to maintain sparse, decorrelated odor coding and thus the odor-specificity of memories.

A 2017 publication in Science[27] showed that fly olfactory circuit implements an improved version of binary locality sensitive hashing via sparse, random projections.

The memory stores this sequence and can recreate it later in the focus if addressed with a pattern similar to one encountered in the past.

SDM can be applied in transcribing speech, with the training consisting of "listening" to a large corpus of spoken language.

In training – in listening to speech – it will build a probabilistic structure with the highest incidence of branching at word boundaries.

In transcribing speech, these branching points are detected and tend to break the stream into segments that correspond to words.

[7] At the University of Memphis, Uma Ramamurthy, Sidney K. D'Mello, and Stan Franklin created a modified version of the sparse distributed memory system that represents "realizing forgetting."

[31] SDM has been applied to statistical prediction, the task of associating extremely large perceptual state vectors with future events.

In conditions of near- or over- capacity, where the associative memory behavior of the model breaks down, the processing performed by the model can be interpreted as that of a statistical predictor and each data counter in an SDM can be viewed as an independent estimate of the conditional probability of a binary function f being equal to the activation set defined by the counter's memory location.

[32] SDMs provide a linear, local function approximation scheme, designed to work when a very large/high-dimensional input (address) space has to be mapped into a much smaller physical memory.

[37] The work in Ratitch et al.[38] combined the SDM memory model with the ideas from memory-based learning, which provides an approximator that can dynamically adapt its structure and resolution in order to locate regions of the state space that are "more interesting"[39] and allocate proportionally more memory resources to model them accurately.

Dana H. Ballard's lab[40] demonstrated a general-purpose object indexing technique for computer vision that combines the virtues of principal component analysis with the favorable matching properties of high-dimensional spaces to achieve high precision recognition.

The indexing algorithm uses an active vision system in conjunction with a modified form of SDM and provides a platform for learning the association between an object's appearance and its identity.

The exponential decay function
The negated-translated sigmoid function