Multifactor dimensionality reduction

Multifactor dimensionality reduction (MDR) is a statistical approach, also used in machine learning automatic approaches,[1] for detecting and characterizing combinations of attributes or independent variables that interact to influence a dependent or class variable.

[2][3][4][5][6][7][8] MDR was designed specifically to identify nonadditive interactions among discrete variables that influence a binary outcome and is considered a nonparametric and model-free alternative to traditional statistical methods such as logistic regression.

XOR is a logical operator that is commonly used in data mining and machine learning as an example of a function that is not linearly separable.

Table 1 A machine learning algorithm would need to discover or approximate the XOR function in order to accurately predict Y using information about X1 and X2.

An alternative strategy would be to first change the representation of the data using constructive induction to facilitate predictive modeling.

Table 2 The machine learning algorithm now has much less work to do to find a good predictive function.

Decision trees, neural networks, or a naive Bayes classifier could be used in combination with measures of model quality such as balanced accuracy[11][12] and mutual information.

That is, machine learning algorithms are good at finding patterns in completely random data.

One approach is to estimate the generalizability of a model to independent datasets using methods such as cross-validation.

[57][58][59] A central challenge is the scaling of MDR to big data such as that from genome-wide association studies (GWAS).