In probability theory and information theory, the variation of information or shared information distance is a measure of the distance between two clusterings (partitions of elements).
It is closely related to mutual information; indeed, it is a simple linear expression involving the mutual information.
Unlike the mutual information, however, the variation of information is a true metric, in that it obeys the triangle inequality.
[1][2][3] Suppose we have two partitions
into disjoint subsets, namely
Let: Then the variation of information between the two partitions is: This is equivalent to the shared information distance between the random variables i and j with respect to the uniform probability measure on
We can rewrite this definition in terms that explicitly highlight the information content of this metric.
The set of all partitions of a set form a compact lattice where the partial order induces two operations, the meet
is the partition with only one block, i.e., all elements grouped together, and the minimum is
, the partition consisting of all elements as singletons.
is easy to understand as that partition formed by all pair intersections of one block of,
Let's define the entropy of a partition
The entropy of a partition is a monotonous function on the lattice of partitions in the sense that
Then the VI distance between
doesn't necessarily imply that
If in the Hasse diagram we draw an edge from every partition to the maximum
and assign it a weight equal to the VI distance between the given partition and
, we can interpret the VI distance as basically an average of differences of edge weights to the maximum For
as defined above, it holds that the joint information of two partitions coincides with the entropy of the meet and we also have that
coincides with the conditional entropy of the meet (intersection)
The variation of information satisfies where
with respect to the uniform probability measure on
is the joint entropy of
are the respective conditional entropies.
The variation of information can also be bounded, either in terms of the number of elements: Or with respect to a maximum number of clusters,
: To verify the triangle inequality
, expand using the identity
It suffices to prove
The right side has a lower bound
which is no less than the left side.