Markov random field

In the domain of physics and probability, a Markov random field (MRF), Markov network or undirected graphical model is a set of random variables having a Markov property described by an undirected graph.

The underlying graph of a Markov random field may be finite or infinite.

When the joint probability density of the random variables is strictly positive, it is also referred to as a Gibbs random field, because, according to the Hammersley–Clifford theorem, it can then be represented by a Gibbs measure for an appropriate (locally defined) energy function.

The prototypical Markov random field is the Ising model; indeed, the Markov random field was introduced as the general setting for the Ising model.

[2] In the domain of artificial intelligence, a Markov random field is used to model various low- to mid-level tasks in image processing and computer vision.

form a Markov random field with respect to

[4] However, the above three Markov properties are equivalent for positive distributions[5] (those that assign only nonzero probabilities to the associated variables).

The relation between the three Markov properties is particularly clear in the following formulation: As the Markov property of an arbitrary probability distribution can be difficult to establish, a commonly used class of Markov random fields are those that can be factorized according to the cliques of the graph.

forms a Markov random field with respect to

Note, however, conflicting terminology is in use: the word potential is often applied to the logarithm of

Some MRF's do not factorize: a simple example can be constructed on a cycle of 4 nodes with some infinite energies, i.e. configurations of zero probabilities,[6] even if one, more appropriately, allows the infinite energies to act on the complete graph on

Any positive Markov random field can be written as exponential family in canonical form with feature functions

such that the full-joint distribution can be written as where the notation is simply a dot product over field configurations, and Z is the partition function: Here,

denotes the set of all possible assignments of values to all the network's random variables.

This expression of a Markov field as a logistic model is only possible if all clique factors are non-zero, i.e. if none of the elements of

The importance of the partition function Z is that many concepts from statistical mechanics, such as entropy, directly generalize to the case of Markov networks, and an intuitive understanding can thereby be gained.

In addition, the partition function allows variational methods to be applied to the solution of the problem: one can attach a driving force to one or more of the random variables, and explore the reaction of the network in response to this perturbation.

Thus, for example, one may add a driving term Jv, for each vertex v of the graph, to the partition function to get: Formally differentiating with respect to Jv gives the expectation value of the random variable Xv associated with the vertex v: Correlation functions are computed likewise; the two-point correlation is: Unfortunately, though the likelihood of a logistic Markov network is convex, evaluating the likelihood or gradient of the likelihood of a model requires inference in the model, which is generally computationally infeasible (see 'Inference' below).

A multivariate normal distribution forms a Markov random field with respect to a graph

if the missing edges correspond to zeros on the precision matrix (the inverse covariance matrix): such that As in a Bayesian network, one may calculate the conditional distribution of a set of nodes

in the Markov random field by summing over all possible assignments to

However, exact inference is a #P-complete problem, and thus computationally intractable in the general case.

Approximation techniques such as Markov chain Monte Carlo and loopy belief propagation are often more feasible in practice.

Some particular subclasses of MRFs, such as trees (see Chow–Liu tree), have polynomial-time inference algorithms; discovering such subclasses is an active research topic.

There are also subclasses of MRFs that permit efficient MAP, or most likely assignment, inference; examples of these include associative networks.

[9][10] Another interesting sub-class is the one of decomposable models (when the graph is chordal): having a closed-form for the MLE, it is possible to discover a consistent structure for hundreds of variables.

This form of the Markov network may be more appropriate for producing discriminative classifiers, which do not model the distribution over the observations.

CRFs were proposed by John D. Lafferty, Andrew McCallum and Fernando C.N.

They can be used to solve various computer vision problems which can be posed as energy minimization problems or problems where different regions have to be distinguished using a set of discriminating features, within a Markov random field framework, to predict the category of the region.

[17] Markov random fields were a generalization over the Ising model and have, since then, been used widely in combinatorial optimizations and networks.