3D Morphable Model

The model follows an analysis-by-synthesis approach over a dataset of 3D example shapes of a single class of objects (e.g., face, hand).

In this way, we can extract meaningful statistics from the dataset and use it to represent new plausible shapes of the object's class.

[1] The question that initiated the research on 3DMMs was to understand how a visual system could handle the vast variety of images produced by a single class of objects and how these can be represented.

The primary assumption in developing 3DMMs was that prior knowledge about object classes was crucial in vision.

3D Face Morphable Models are the most popular 3DMMs since they were the first to be developed in the field of facial recognition.

[2] It has also been applied to the whole human body,[3] the hand,[4] the ear,[5] cars,[6] and animals.

3DFMM provides a way to represent face shape and texture disentangled from external factors, such as camera parameters and illumination.

The prior knowledge is statistically extracted from a database of 3D examples and used as a basis to represent or generate new plausible objects of that class.

Its effectiveness lies in the ability to efficiently encode this prior information, enabling the solution of otherwise ill-posed problems (such as single-view 3D object reconstruction).

[22] In general, 3D faces can be modeled by three variational components extracted from the face dataset:[9] The 3DFMM uses statistical analysis to define a statistical shape space, a vectorial space equipped with a probability distribution, or prior.

[24] To extract the prior from the example dataset, all the 3D faces must be in a dense point-to-point correspondence.

In this way, by fixing a point, we can, for example, derive the probability distribution of the texture's red channel values over all the faces.

Using a unique generator function for the whole face leads to the imperfect representation of finer details.

A solution is to use local models of the face by segmenting important parts such as the eyes, mouth, and nose.

Depending on how identity and expression are combined, these methods can be classified as additive, multiplicative, and nonlinear.

are the matrices basis and the coefficients vectors of the shape and expression space, respectively.

[8] Two PCAs can be performed to learn two different spaces for shape and expression.

[26] In a multiplicative model, shape and expression can be combined in different ways.

This one-to-one correspondence allows us to represent appearance analogously to the linear shape model

Facial recognition can be considered the field that originated the concepts that later on converged into the formalization of the morphable models.

However, this method had limitations: it was constrained to fixed poses and illumination and lacked an effective representation of shape differences.

To address these limitations, researchers added an eigendecomposition of 2D shape variations between faces.

Landmark-based face warping was introduced by Craw and Cameron (1991),[31] and the first statistical shape model, Active Shape Model, was proposed by Cootes et al.

Since these 2D methods were effective only for fixed poses and illumination, they were extended by Vetter and Poggio (1997)[34] to handle more diverse settings.

On the other hand, advances in 3D computer graphics showed that simulating pose and illumination variations was straightforward.

The combination of graphics methods with face modeling led to the first formulation of 3DMMs by Blanz and Vetter (1999).

[8] The analysis-by-synthesis approach enabled the mapping of the 3D and 2D domains and a new representation of 3D shape and appearance.

[9] In the original definition of Blanz and Vetter,[8] the shape of a face is represented as the vector

To extract the statistics from the dataset, they performed PCA to generate the shape space of dimension to

In this case, a new model can be generated in the orthogonal basis using the shape and the texture eigenvector