Multinomial distribution

For example, it models the probability of counts for each side of a k-sided die rolled n times.

The term "multinoulli" is sometimes used for the categorical distribution to emphasize this four-way relationship (so n determines the suffix, and k the prefix).

The binomial distribution generalizes this to the number of heads from performing n independent flips (Bernoulli trials) of the same coin.

Mathematically, we have k possible mutually exclusive outcomes, with corresponding probabilities p1, ..., pk, and n independent trials.

Note: Since we’re assuming that the voting population is large, it is reasonable and permissible to think of the probabilities as unchanging once a voter is selected for the sample.

The expected number of times the outcome i was observed over n trials is The covariance matrix is as follows.

The support of the multinomial distribution is the set Its number of elements is In matrix notation, and with pT = the row vector transpose of the column vector p. Just like one can interpret the binomial distribution as (normalized) one-dimensional (1D) slices of Pascal's triangle, so too can one interpret the multinomial distribution as 2D (triangular) slices of Pascal's pyramid, or 3D/4D/+ (pyramid-shaped) slices of higher-dimensional analogs of Pascal's triangle.

This reveals an interpretation of the range of the distribution: discretized equilateral "pyramids" in arbitrary dimension—i.e.

[citation needed] Similarly, just like one can interpret the binomial distribution as the polynomial coefficients of

However, by symmetry, every point occupies exactly the same volume (except a negligible set on the boundary), so we obtain a probability density

The above concentration phenomenon can be easily generalized to the case where we condition upon linear constraints.

An analogous proof applies in this Diophantine problem of coupled linear equations in count variables

The goal of equivalence testing is to establish the agreement between a theoretical multinomial distribution and observed counting frequencies.

The equivalence test for Euclidean distance can be found in text book of Wellek (2010).

[6] The exact equivalence test for the specific cumulative distance is proposed in Frey (2009).

[8] In the setting of a multinomial distribution, constructing confidence intervals for the difference between the proportions of observations from two events,

Some of the literature on the subject focused on the use-case of matched-pairs binary data, which requires careful attention when translating the formulas to the general case of

approximate confidence interval, the margin of error may incorporate the appropriate quantile from the standard normal distribution, as follows:

The posterior will be the calculations from above, but after adding 1/2 to each of the k elements, leading to an overall increase of the sample size by

This was originally developed for a multinomial distribution with four events, and is known as wald+2, for analyzing matched pairs data (see the next section for more details).

For the case of matched-pairs binary data, a common task is to build the confidence interval of the difference of the proportion of the matched events.

Such scenarios can be represented using a two-by-two contingency table with the number of elements that had each of the combination of events.

And in such a case, there is an interest in building a confidence interval for the difference of proportions from the marginals of the following (sampled) contingency table: In this case, checking the difference in marginal proportions means we are interested in using the following definitions:

) is the same as building a confidence interval for the difference of the proportions from the secondary diagonal of the two-by-two contingency table (

The Wald confidence intervals from the previous section can be applied to this setting, and appears in the literature using alternative notations.

Specifically, the SE often presented is based on the contingency table frequencies instead of the sample proportions.

[11] One such modification includes Agresti and Min’s Wald+2 (similar to some of their other works[13]) in which each cell frequency had an extra

This leads to the following modified SE for the case of matched pairs data:

Other modifications include Bonett and Price’s Adjusted Wald, and Newcombe’s Score.

The resulting outcome is the component {Xj = 1, Xk = 0 for k ≠ j } is one observation from the multinomial distribution with

If we sample from the multinomial distribution

\mathrm {Multinomial} (n;0.2,0.3,0.5)

, and plot the heatmap of the samples within the 2-dimensional simplex (here shown as a black triangle), we notice that as

n\to \infty

, the distribution converges to a gaussian around the point

(0.2,0.3,0.5)

, with the contours converging in shape to ellipses, with radii converging as

1/{\sqrt {n}}

. Meanwhile, the separation between the discrete points converge as

1/n

, and so the discrete multinomial distribution converges to a continuous gaussian distribution.