Hyperprior

As with the term hyperparameter, the use of hyper is to distinguish it from a prior distribution of a parameter of the model for the underlying system.

Firstly, use of a hyperprior allows one to express uncertainty in a hyperparameter: taking a fixed prior is an assumption, varying a hyperparameter of the prior allows one to do sensitivity analysis on this assumption, and taking a distribution on this hyperparameter allows one to express uncertainty in this assumption: "assume that the prior is of this form (this parametric family), but that we are uncertain as to precisely what the values of the parameters should be".

More abstractly, if one uses a hyperprior, then the prior distribution (on the parameter of the underlying model) itself is a mixture density: it is the weighted average of the various prior distributions (over different hyperparameters), with the hyperprior being the weighting.

In fact, the convex hull of normal distributions is dense in all distributions, so in some cases, you can arbitrarily closely approximate a given prior by using a family with a suitable hyperprior.

If one is using conjugate priors, then this space is preserved by moving to posteriors – thus as data arrives, the distribution changes, but remains on this space: as data arrives, the distribution evolves as a dynamical system (each point of hyperparameter space evolving to the updated hyperparameters), over time converging, just as the prior itself converges.