Factor analysis of mixed data

In statistics, factor analysis of mixed data or factorial analysis of mixed data (FAMD, in the French original: AFDM or Analyse Factorielle de Données Mixtes), is the factorial method devoted to data tables in which a group of individuals is described both by quantitative and qualitative variables.

It belongs to the exploratory methods developed by the French school called Analyse des données (data analysis) founded by Jean-Paul Benzécri.

The term mixed refers to the use of both quantitative and qualitative variables.

Roughly, we can say that FAMD works as a principal components analysis (PCA) for quantitative variables and as a multiple correspondence analysis (MCA) for qualitative variables.

Indeed, it is easy to include supplementary quantitative variables in MCA by the correlation coefficients between the variables and factors on individuals (a factor on individuals is the vector gathering the coordinates of individuals on a factorial axis); the representation obtained is a correlation circle (as in PCA).

Similarly, it is easy to include supplementary categorical variables in PCA.

[1] For this, each category is represented by the center of gravity of the individuals who have it (as MCA).

When the active variables are mixed, the usual practice is to perform discretization on the quantitative variables (e.g. usually in surveys the age is transformed in age classes).

This practice reaches its limits: The data include

assigns a value to each individual, it is the case for initial variables and principal components) the most correlated to all

variables in the following sense: In MCA of Q, we look for the function on

The contribution of each variable in this criterion is bounded by 1.

The representation of individuals is made directly from factors

The representation of quantitative variables is constructed as in PCA (correlation circle).

Note that we take the exact centroid and not, as is customary in MCA, the centroid up to a coefficient dependent on the axis (in MCA this coefficient is equal to the inverse of the square root of the eigenvalue; it would be inadequate in FAMD).

The representation of variables is called relationship square.

is equal to squared correlation ratio between the variable

is equal to the squared correlation coefficient between the variable

The relationship indicators between the initial variables are combined in a so-called relationship matrix that contains, at the intersection of row

: A very small data set (Table 1) illustrates the operation and outputs of the FAMD .

Data were analyzed using the R package function FAMD FactoMineR .

The matrix shows an entanglement of the relationships between the two types of variables.

The representation of variables (relationship square, Figure 2) shows that the first axis (

; the representation of the categories (Figure 4) clarifies the nature of the relationship between

This example illustrates how the FAMD simultaneously analyses of quantitative and qualitative variables.

Thus, it shows, in this example, a first dimension based on the two types of variables.

The FAMD's original work is due to Brigitte Escofier[2] and Gilbert Saporta.

[3] This work was resumed in 2002 by Jérôme Pagès.

[4] A more complete presentation of FAMD in English is included in a book of Jérôme Pagès.

The method is implemented in the Python library prince.