Margin (machine learning)

Note that there are many distances and decision boundaries that may be appropriate for certain datasets and goals.

There are theoretical justifications (based on the VC dimension) as to why maximizing the margin (under some suitable constraints) may be beneficial for machine learning and statistical inference algorithms.

One reasonable choice as the best hyperplane is the one that represents the largest separation, or margin, between the classes.

Hence, one should choose the hyperplane such that the distance from it to the nearest data point on each side is maximized.

If such a hyperplane exists, it is known as the maximum-margin hyperplane, and the linear classifier it defines is known as a maximum margin classifier (or, equivalently, the perceptron of optimal stability).

H 1 does not separate the classes.
H 2 does, but only with a small margin.
H 3 separates them with the maximum margin.