Concentration parameter

In the case of multivariate Dirichlet distributions, there is some confusion over how to define the concentration parameter.

In the topic modelling literature, it is often defined as the sum of the individual Dirichlet parameters,[1] when discussing symmetric Dirichlet distributions (where the parameters are the same for all dimensions) it is often defined to be the value of the single Dirichlet parameter used in all dimensions[citation needed].

A typical vocabulary might have 100,000 words, leading to a 100,000-dimensional categorical distribution.

However, a coherent topic might only have a few hundred words with any significant probability mass.

With a larger vocabulary of around 1,000,000 words, an even smaller value, e.g. 0.0001, might be appropriate.