[1] The neural gas is a simple algorithm for finding optimal data representations based on feature vectors.
The adaptation step of the neural gas can be interpreted as gradient descent on a cost function.
By adapting not only the closest feature vector but all of them with a step size decreasing with increasing distance order, compared to (online) k-means clustering a much more robust convergence of the algorithm can be achieved.
Compared to self-organized map, the neural gas model does not assume that some vectors are neighbors.
The name "neural gas" is because one can imagine it to be what an SOM would be like if there is no underlying graph, and all points are free to move without the bonds that bind them together.
A number of variants of the neural gas algorithm exists in the literature so as to mitigate some of its shortcomings.
[7] A performance-oriented approach that avoids the risk of overfitting is the Plastic Neural gas model.
GNG has been widely used in several domains,[9] demonstrating its capabilities for clustering data incrementally.
Since in the GNG input data is presented sequentially one by one, the following steps are followed at each iteration: Another neural gas variant inspired by the GNG algorithm is the incremental growing neural gas (IGNG).
"[7] Having a network with a growing set of nodes, like the one implemented by the GNG algorithm was seen as a great advantage, however some limitation on the learning was seen by the introduction of the parameter λ, in which the network would only be able to grow when iterations were a multiple of this parameter.
The "Plastic Neural Gas" model[8] solves this problem by making decisions to add or remove nodes using an unsupervised version of cross-validation, which controls an equivalent notion of "generalization ability" for the unsupervised setting.
While growing-only methods only cater for the incremental learning scenario, the ability to grow and shrink is suited to the more general streaming data problem.
of the feature vectors, the neural gas algorithm involves sorting, which is a procedure that does not lend itself easily to parallelization or implementation in analog hardware.