Bayesian multivariate linear regression

In statistics, Bayesian multivariate linear regression is a Bayesian approach to multivariate linear regression, i.e. linear regression where the predicted outcome is a vector of correlated random variables rather than a single scalar random variable.

A more general treatment of this approach can be found in the article MMSE estimator.

Consider a regression problem where the dependent variable to be predicted is not a single real-valued scalar but an m-length vector of correlated real numbers.

As in the standard regression setup, there are n observations, where each observation i consists of k−1 explanatory variables, grouped into a vector

of length k (where a dummy variable with a value of 1 has been added to allow for an intercept coefficient).

This can be viewed as a set of m related regression problems for each observation i:

Equivalently, it can be viewed as a single regression problem where the outcome is a row vector

and the regression coefficient vectors are stacked next to each other, as follows:

for each regression problem are stacked horizontally:

We can write the entire regression problem in matrix form as:

matrix with the observations stacked vertically, as in the standard linear regression setup:

The classical, frequentists linear least squares solution is to simply estimate the matrix of regression coefficients

To obtain the Bayesian solution, we need to specify the conditional likelihood and then find the appropriate conjugate prior.

As with the univariate case of linear Bayesian regression, we will find that we can specify a natural conditional conjugate prior (which is scale dependent).

We seek a natural conjugate prior—a joint density

which is of the same functional form as the likelihood.

, we re-write the likelihood so it is normal in

(the deviation from classical sample estimate).

Using the same technique as with Bayesian linear regression, we decompose the exponential term using a matrix-form of the sum-of-squares technique.

Here, however, we will also need to use the Matrix Differential Calculus (Kronecker product and vectorization transformations).

First, let us apply sum-of-squares to obtain new expression for the likelihood:

We would like to develop a conditional form for the priors:

is some form of normal distribution in the matrix

This is accomplished using the vectorization transformation, which converts the likelihood from a function of the matrices

denotes the Kronecker product of matrices A and B, a generalization of the outer product which multiplies an

matrix, consisting of every combination of products of elements from the two matrices.

With the likelihood in a more tractable form, we can now find a natural (conditional) conjugate prior.

The natural conjugate prior using the vectorized variable

Using the above prior and likelihood, the posterior distribution can be expressed as:[1]

This takes the form of an inverse-Wishart distribution times a Matrix normal distribution: