[1][better source needed] Given two objects, A and B, each with n binary attributes, SMC is defined as:
where The simple matching distance (SMD), which measures dissimilarity between sample sets, is given by
[2][better source needed] SMC is linearly related to Hamann similarity:
is the squared Euclidean distance between the two objects (binary vectors) and n is the number of attributes.
In other contexts, where 0 and 1 carry equivalent information (symmetry), the SMC is a better measure of similarity.
For example, vectors of demographic variables stored in dummy variables, such as binary gender, would be better compared with the SMC than with the Jaccard index since the impact of gender on similarity should be equal, independently of whether male is defined as a 0 and female as a 1 or the other way around.
However, when we have symmetric dummy variables, one could replicate the behaviour of the SMC by splitting the dummies into two binary attributes (in this case, male and female), thus transforming them into asymmetric attributes, allowing the use of the Jaccard index without introducing any bias.
By using this trick, the Jaccard index can be considered as making the SMC a fully redundant metric.
The SMC remains, however, more computationally efficient in the case of symmetric dummy variables since it does not require adding extra dimensions.
The Jaccard index is also more general than the SMC and can be used to compare other data types than just vectors of binary attributes, such as probability measures.