The principle of transformation groups is a methodology for assigning prior probabilities in statistical inference issues, initially proposed by physicist E. T.
Prior probabilities determined by this principle are objective in that they rely solely on the inherent characteristics of the problem, ensuring that any two individuals applying the principle to the same issue would assign identical prior probabilities.
Thus, this principle is integral to the objective Bayesian interpretation of probability.
This rule is implemented by identifying symmetries, defined by transformation groups, that allow a problem to converted into an equivalent one, and utilizing these symmetries to calculate the prior probabilities.
For problems with discrete variables (such as dice, cards, or categorical data), symmetries are characterized by permutation groups and, in these instances, the principle simplifies to the principle of indifference.
In cases involving continuous variables, the symmetries may be represented by other types of transformation groups.
Determining the prior probabilities in such cases often requires solving a differential equation, which may not yield a unique solution.
For a given coin flip, denote the probability of an outcome of heads as
In applying the desideratum, consider the information contained in the event of the coin flip as framed.
This argument extends to N categories, to give the "flat" prior probability 1/N.
This provides a consistency-based argument for the principle of indifference: If someone is truly ignorant about a discrete or countable set of outcomes apart from their potential existence but does not assign them equal prior probabilities, then they are assigning different probabilities when given the same information.
is a normalized distribution is a significant prerequisite to obtaining the final conclusion of a uniform prior, because uniform probability distributions can only be normalized given a finite input domain.
is a scale parameter means that the sampling distribution has the functional form: Where, as before,
The requirement that probabilities be finite and positive forces the condition
, not 1, so the sampling probability changes to which is invariant (i.e., has the same form before and after the transformation).
Furthermore, the prior probability changes to which has the unique solution (up to proportionality) This is a well-known Jeffreys prior for scale parameters, which is "flat" on the log scale, although it is derived using a different argument to that here, based on the Fisher information function.
The fact that these two methods give the same results in this case does not imply they do in general.
Edwin Jaynes used this principle to provide a resolution to Bertrand's Paradox[2] by stating his ignorance about the exact position of the circle.
; changing the information may result in a different probability assignment.
The same argument using "complete ignorance," or more precisely the information actually described, gives Intuition tells us that we should have
It could reasonably be assumed that: Note that this new information probably wouldn't break the symmetry between "heads" and "tails," so that permutation would still apply in describing "equivalent problems", and we would require: This is a good example of how the principle of transformation groups can be used to "flesh out" personal opinions.
If a prior probability assignment doesn't "seem right" according to what your intuition tells you, then there must be some "background information" that has not been put into the problem.
In some sense, combining the method of transformation groups with one's intuition can be used to "weed out" the actual assumptions one has.
[4] A strength of this principle lies in its application to continuous parameters, where the notion of "complete ignorance" is not so well-defined as in the discrete case.
However, if applied with infinite limits, it often gives improper prior distributions.
Note that the discrete case for a countably infinite set, such as
However, in order to be absolutely sure to avoid incoherent results and paradoxes, the prior distribution should be approached via a well-defined and well-behaved limiting process.
This indicates that the data are so uninformative about the parameters that the prior probability of arbitrarily large values still matters in the final answer.
In some sense, an improper posterior means that the information contained in the data has not "ruled out" arbitrarily large values.
From a state of complete ignorance, only the data or some other form of additional information can rule out such absurdities.