Factorial code

A naive Bayes classifier will assume the pixels are statistically independent random variables and therefore fail to produce good results.

Jürgen Schmidhuber (1992) re-formulated the problem in terms of predictors and binary feature detectors, each receiving the raw data as an input.

The global optimum of this objective function corresponds to a factorial code represented in a distributed fashion across the outputs of the feature detectors.

Painsky, Rosset and Feder (2016, 2017) further studied this problem in the context of independent component analysis over finite alphabet sizes.

Practically, they show that with a careful implementation, the favorable properties of the order permutation may be achieved in an asymptotically optimal computational complexity.