Structural similarity index measure

SSIM is a perception-based model that considers image degradation as perceived change in structural information, while also incorporating important perceptual phenomena, including both luminance masking and contrast masking terms.

Structural information is the idea that the pixels have strong inter-dependencies especially when they are spatially close.

These dependencies carry important information about the structure of the objects in the visual scene.

Luminance masking is a phenomenon whereby image distortions (in this context) tend to be less visible in bright regions, while contrast masking is a phenomenon whereby distortions become less visible where there is significant activity or "texture" in the image.

This evolved, through their collaboration with Hamid Sheikh and Eero Simoncelli, into the current version of SSIM, which was published in April 2004 in the IEEE Transactions on Image Processing.

[1] In addition to defining the SSIM quality index, the paper provides a general context for developing and evaluating perceptual quality measures, including connections to human visual neurobiology and perception, and direct validation of the index against human subject ratings.

The basic model was developed in the Laboratory for Image and Video Engineering (LIVE) at The University of Texas at Austin and further developed jointly with the Laboratory for Computational Vision (LCV) at New York University.

Further variants of the model have been developed in the Image and Visual Computing Laboratory at University of Waterloo and have been commercially marketed.

SSIM subsequently found strong adoption in the image processing community and in the television and social media industries.

These SSIM values can be aggregated across the full images by averaging or other variations.

Choosing the third denominator stabilizing constant as: leads to a simplification when combining the c and s components with equal exponents (

SSIM satisfies the identity of indiscernibles, and symmetry properties, but not the triangle inequality or non-negativity, and thus is not a distance function.

However, under certain conditions, SSIM may be converted to a normalized root MSE measure, which is a distance function.

In the case of video quality assessment,[6] the authors propose to use only a subgroup of the possible windows to reduce the complexity of the calculation.

A more advanced form of SSIM, called Multiscale SSIM (MS-SSIM)[4] is conducted over multiple scales through a process of multiple stages of sub-sampling, reminiscent of multiscale processing in the early vision system.

It has been shown to perform equally well or better than SSIM on different subjective image and video databases.

The authors mention that a 1/0/0 weighting (ignoring anything but edge distortions) leads to results that are closer to subjective ratings.

This suggests that edge regions play a dominant role in image quality perception.

[10] Structural dissimilarity (DSSIM) may be derived from SSIM, though it does not constitute a distance function as the triangle inequality is not necessarily satisfied.

It is worth noting that the original version SSIM was designed to measure the quality of still images.

[7] A common practice is to calculate the average SSIM value over all frames in the video sequence.

[11][6][12] The complex wavelet transform variant of the SSIM (CW-SSIM) is designed to deal with issues of image scaling, translation and rotation.

It also allows adapting the scores to the intended viewing device, comparing video across different resolutions and contents.

SSIMULACRA and SSIMULACRA2 are variants of SSIM developed by Cloudinary with the goal of fitted to subjective opinion data.

The variants operate in XYB color space and combine MS-SSIM with two types of asymmetric error maps for blockiness/ringing and smoothing/blur, common compression artifacts.

It is able to reflect radiologist preference for images much better than other SSIM variants tested.

SSIM has been repeatedly shown to significantly outperform MSE and its derivates in accuracy, including research by its own authors and others.

[7][22][23][24][25][26] A paper by Dosselmann and Yang claims that the performance of SSIM is "much closer to that of the MSE" than usually assumed.

While they do not dispute the advantage of SSIM over MSE, they state an analytical and functional dependency between the two metrics.

As an example, they cite Reibman and Poole, who found that MSE outperformed SSIM on a database containing packet-loss–impaired video.