Growing self-organizing map

The GSOM was developed to address the issue of identifying a suitable map size in the SOM.

By using the value called Spread Factor (SF), the data analyst has the ability to control the growth of the GSOM.

The figure shows the three possible node growth options for a rectangular GSOM.

The GSOM process is as follows: The GSOM can be used for many preprocessing tasks in Data mining, for Nonlinear dimensionality reduction, for approximation of principal curves and manifolds, for clustering and classification.

It gives often the better representation of the data geometry than the SOM (see the classical benchmark for principal curves on the left).

Node growth options in GSOM: (a) one new node, (b) two new nodes and (c) three new nodes.
Approximation of a spiral with noise by 1D SOM (the upper row) and GSOM (the lower row) with 50 (the first column) and 100 (the second column) nodes. The Fraction of variance unexplained is: a) 4.68% (SOM, 50 nodes); b) 1.69% (SOM, 100 nodes); c) 4.20% (GSOM, 50 nodes); d) 2.32% (GSOM, 100 nodes). The initial approximation for SOM was equidistribution of nodes in a segment on the first principal component with the same variance as for the data set. The initial approximation for GSOM was the mean point. [ 1 ]