Targeted projection pursuit

Targeted projection pursuit is a type of statistical technique used for exploratory data analysis, information visualization, and feature selection.

It allows the user to interactively explore very complex data (typically having tens to hundreds of attributes) to find features or patterns of potential interest.

Conventional, or 'blind', projection pursuit, finds the most "interesting" possible projections in multidimensional data, using a search algorithm that optimizes some fixed criterion of "interestingness" – such as deviation from a normal distribution.

In contrast, targeted projection pursuit allows the user to explore the space of projections by manipulating data points directly in an interactive scatter plot.

Targeted projection pursuit has found applications in DNA microarray data analysis,[1] protein sequence analysis,[2] graph layout[3] and digital signal processing.

An example of targeted projection pursuit
In this example targeted projection pursuit is being used to explore projections of a gene expression data set. Each of the 122 points corresponds to a sample taken from a cancer tumor of four diagnostic classes (represented by color). For each sample, the expression level of 100 genes was recorded (represented by the axes). The animation shows that TPP is able to separate two of the classes clearly (red and purple), but two others could not be distinguished (blue and green). The position of the axes then indicates the activation of which genes are most associated with each class.