[6] Two binary variables are considered positively associated if most of the data falls along the diagonal cells.
[3] Despite these antecedents which predate Matthews's use by several decades, the term MCC is widely used in the field of bioinformatics and machine learning.
The coefficient takes into account true and false positives and negatives and is generally regarded as a balanced measure which can be used even if the classes are of very different sizes.
[10] The MCC is in essence a correlation coefficient between the observed and predicted binary classifications; it returns a value between −1 and +1.
[11] MCC is closely related to the chi-square statistic for a 2×2 contingency table where n is the total number of observations.
While there is no perfect way of describing the confusion matrix of true and false positives and negatives by a single number, the Matthews correlation coefficient is generally regarded as being one of the best such measures.
For example, assigning every object to the larger set achieves a high proportion of correct predictions, but is not generally a useful classification.
[12][13] Markedness and Informedness correspond to different directions of information flow and generalize Youden's J statistic, the
[12] Some scientists claim the Matthews correlation coefficient to be the most informative single score to establish the quality of a binary classifier prediction in a confusion matrix context.
With these two labelled sets (actual and predictions) we can create a confusion matrix that will summarize the results of testing the classifier: In this confusion matrix, of the 8 cat pictures, the system judged that 2 were dogs, and of the 4 dog pictures, it predicted that 1 was a cat.
This formula can be more easily understood by defining intermediate variables:[26] Using above formula to compute MCC measure for the dog and cat example discussed above, where the confusion matrix is treated as a 2 × Multiclass example: An alternative generalization of the Matthews Correlation Coefficient to more than two classes was given by Powers [12] by the definition of Correlation as the geometric mean of Informedness and Markedness.
As explained by Davide Chicco in his paper "Ten quick tips for machine learning in computational biology" [14] (BioData Mining, 2017) and "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation" [28] (BMC Genomics, 2020), the Matthews correlation coefficient is more informative than F1 score and accuracy in evaluating binary classification problems, because it takes into account the balance ratios of the four confusion matrix categories (true positives, true negatives, false positives, false negatives).
And suppose also you made some mistakes in designing and training your machine learning classifier, and now you have an algorithm which always predicts positive.
On the contrary, to avoid these dangerous misleading illusions, there is another performance score that you can exploit: the Matthews correlation coefficient [40] (MCC).
By considering the proportion of each class of the confusion matrix in its formula, its score is high only if your classifier is doing well on both the negative and the positive elements.
By checking this value, instead of accuracy and F1 score, you would then be able to notice that your classifier is going in the wrong direction, and you would become aware that there are issues you ought to solve before proceeding.
Similarly to the previous case, if a researcher analyzed only these two score indicators, without considering the MCC, they would wrongly think the algorithm is performing quite well in its task, and would have the illusion of being successful.
Acting as an alarm, the MCC would be able to inform the data mining practitioner that the statistical model is performing poorly.
For these reasons, we strongly encourage to evaluate each test performance through the Matthews correlation coefficient (MCC), instead of the accuracy and the F1 score, for any binary classification problem.Chicco's passage might be read as endorsing the MCC score in cases with imbalanced data sets.