Shattered set

The concept of shattered sets plays an important role in Vapnik–Chervonenkis theory, also known as VC-theory.

Shattering and VC-theory are used in the study of empirical processes as well as in statistical computational learning theory.

The class C shatters the set A if for each subset a of A, there is some element c of C such that Equivalently, C shatters A when their intersection is equal to A's power set: P(A) = { c ∩ A | c ∈ C }.

The set A is often assumed to be finite because, in empirical processes, we are interested in the shattering of finite sets of data points.

We will show that the class of all discs in the plane (two-dimensional space) does not shatter every set of four points on the unit circle, yet the class of all convex sets in the plane does shatter every finite set of points on the unit circle.

Let A be a set of four points on the unit circle and let C be the class of all discs.

To test where C shatters A, we attempt to draw a disc around every subset of points in A.

First, we draw a disc around the subsets of each isolated point.

Next, we try to draw a disc around every subset of point pairs.

Hence, any pair of opposite points cannot be isolated out of A using intersections with class C and so C does not shatter A.

As visualized below: Because there is some subset which can not be isolated by any disc in C, we conclude then that A is not shattered by C. And, with a bit of thought, we can prove that no set of four points is shattered by this C. However, if we redefine C to be the class of all elliptical discs, we find that we can still isolate all the subsets from above, as well as the points that were formerly problematic.

Thus, this specific set of 4 points is shattered by the class of elliptical discs.

Visualized below: With a bit of thought, we could generalize that any set of finite points on a unit circle could be shattered by the class of all convex sets (visualize connecting the dots).

To quantify the richness of a collection C of sets, we use the concept of shattering coefficients (also known as the growth function).

: The third property means that if C cannot shatter any set of cardinality N then it can not shatter sets of larger cardinalities.

, because as n gets smaller, there are fewer sets that could be omitted.

The extreme of this is S_C(0) (the shattering coefficient of the empty set), which must always be

These statements lends themselves to defining the VC dimension of a class C as: or, alternatively, as Note that

The VC dimension is usually defined as VC_0, the largest cardinality of points chosen that will still shatter A (i.e. n such that

Altneratively, if for any n there is a set of cardinality n which can be shattered by C, then

for all n and the VC dimension of this class C is infinite.

The set A of four particular points on the unit circle (the unit circle is shown in purple).