Metalog distribution

Together with its transforms, the metalog family of continuous distributions is unique because it embodies all of following properties: virtually unlimited shape flexibility; a choice among unbounded, semi-bounded, and bounded distributions; ease of fitting to data with linear least squares; simple, closed-form quantile function (inverse CDF) equations that facilitate simulation; a simple, closed-form PDF; and Bayesian updating in closed form in light of new data.

Moreover, like a Taylor series, metalog distributions may have any number of terms, depending on the degree of shape flexibility desired and other application needs.

[2] The history of probability distributions can be viewed, in part, as a progression of developments towards greater flexibility in shape and bounds when fitting to data.

In contrast, Bayes' theorem laid the foundation for the state-of-information, belief-based probability representations.

Moreover, many empirical and experimental data sets exhibited shapes that could not be well matched by the normal or other continuous distributions.

So began the search for continuous probability distributions with flexible shapes and bounds.

Early in the 20th century, the Pearson[5] family of distributions, which includes the normal, beta, uniform, gamma, student-t, chi-square, F, and five others,[6] emerged as a major advance in shape flexibility.

Both families can represent the first four moments of data (mean, variance, skewness, and kurtosis) with smooth continuous curves.

Finally, their equations include intractable integrals and complex statistical functions, so that fitting to data typically requires iterative methods.

Shortly thereafter, Keelin[1] developed the family of metalog distributions, another instance of the QPD class, which is more shape-flexible than the Pearson and Johnson families, offers a choice of boundedness, has closed-form equations that can be fit to data with linear least squares, and has closed-form quantile functions, which facilitate Monte Carlo simulation.

[10] First, the resulting quantile function would have significant shape flexibility, governed by the coefficients

adjusts kurtosis primarily; and adding subsequent non-zero terms yields more nuanced shape refinements.

Based on the above equations and the following transformations that enable a choice of bounds, the family of metalog distributions is composed of unbounded, semibounded, and bounded metalogs, along with their symmetric-percentile triplet (SPT) special cases.

To meet this need, Keelin used transformations to derive semi-bounded and bounded metalog distributions.

to be metalog-distributed, all members of the metalog family meet Keelin and Powley's[9] definition of a quantile-parameterized distribution and thus possess the properties thereof.

By contrast, the Pearson and Johnson families of distributions are limited to two shape parameters.

[14] An alternate fitting method, implemented as a linear program, determines the coefficients by minimizing the sum of absolute distances between the CDF and the data, subject to feasibility constraints.

Keelin originally showed analogous results for a wide range of distributions[20] and has since provided further illustrations.

Three-term unbounded metalogs can be parameterized in closed form with their first three central moments.

[23] This property can be used, for example, to represent the sum of independent, non-identically distributed random variables.

Parameterizing a three-term metalog with these central moments yields a continuous distribution that exactly preserves these three moments, and accordingly provides a reasonable approximation to the shape of the distribution of the sum of independent random variables.

Since their quantile functions are expressed in closed form, metalogs facilitate Monte Carlo simulation.

into the Metalog quantile function (inverse CDF) produces random samples of

Due to their shape flexibility, metalog distributions can be an attractive choice for eliciting and representing expert opinion.

In a classic paper, Howard (1970)[25] shows how the beta-binomial distribution can be used to update, according to Bayes rule in closed form, uncertainty over the long-run frequency

In contrast, if the uncertainty of interest to be updated is defined not by a scalar probability over a discrete event (like the result of a coin toss) but by a probability density function over a continuous variable, metalog Bayesian updating may be used.

[17] Due to their shape and bounds flexibility, metalogs can be used to represent empirical or other data in virtually any field of human endeavor.

For data exploration and matching other probability distributions such as the sum of lognormals, eight to 12 terms is usually sufficient.

The case with 16 terms is infeasible for this data set, as indicated by the blank cell in the metalog panel.

For example, when applied to the steelhead weight data, the AIC ranking of metalog distributions from 2-16 terms along with a wide range of classical distributions identifies the 11-term log metalog as the best fit to this data.

Three-term metalog distributions
Four-term metalog distribution when
Bounded SPT metalog parameterized with CDF data and and with lower and upper bounds and respectively.
10-term log metalog distribution over maximum annual river gauge height (ft) from 1920 to 2014 for the Williamson River below Sprague River confluence, Chiloquin, Oregon. Data source: USGS .
How metalogs converge to standard normal distribution as increases from 2 to 10
Weibull distributions (blue) closely approximated by nine-term semi-bounded metalog distributions (dashed, yellow)
For 3,474 steelhead trout caught and released on the Babine River in British Columbia during 2006-2010, empirical weight data (histogram) and 10-term log metalog PDF (blue curve) fit to this data by least squares.
Metalog panel for steelhead weight data