Smooth maximum

In mathematics, a smooth maximum of an indexed family x1, ..., xn of numbers is a smooth approximation to the maximum function

meaning a parametric family of functions

such that for every α, the function ⁠

⁠ is smooth, and the family converges to the maximum function ⁠

The concept of smooth minimum is similarly defined.

In many cases, a single family approximates both: maximum as the parameter goes to positive infinity, minimum as the parameter goes to negative infinity; in symbols, ⁠

The term can also be used loosely for a specific smooth function that behaves similarly to a maximum, without necessarily being part of a parametrized family.

For large positive values of the parameter

, the following formulation is a smooth, differentiable approximation of the maximum function.

For negative values of the parameter that are large in absolute value, it approximates the minimum.

has the following properties: The gradient of

is closely related to softmax and is given by This makes the softmax function useful for optimization techniques that use gradient descent.

This operator is sometimes called the Boltzmann operator,[1] after the Boltzmann distribution.

Another smooth maximum is LogSumExp: This can also be normalized if the

are all non-negative, yielding a function with domain

term corrects for the fact that

by canceling out all but one zero exponential, and

The mellowmax operator[1] is defined as follows: It is a non-expansive operator.

, it acts like an arithmetic mean.

, it acts like a minimum.

This operator can be viewed as a particular instantiation of the quasi-arithmetic mean.

It can also be derived from information theoretical principles as a way of regularizing policies with a cost function defined by KL divergence.

The operator has previously been utilized in other areas, such as power engineering.

[2] Another smooth maximum is the p-norm: which converges to

An advantage of the p-norm is that it is a norm.

As such it is scale invariant (homogeneous):

, and it satisfies the triangle inequality.

The following binary operator is called the Smooth Maximum Unit (SMU):[3] where

https://www.johndcook.com/soft_maximum.pdf M. Lange, D. Zühlke, O. Holz, and T. Villmann, "Applications of lp-norms and their smooth approximations for gradient based learning vector quantization," in Proc.

ESANN, Apr.

Smoothmax of (−x, x) versus x for various parameter values. Very smooth for =0.5, and more sharp for =8.