Belief propagation

Belief propagation, also known as sum–product message passing, is a message-passing algorithm for performing inference on graphical models, such as Bayesian networks and Markov random fields.

Belief propagation is commonly used in artificial intelligence and information theory, and has demonstrated empirical success in numerous applications, including low-density parity-check codes, turbo codes, free energy approximation, and satisfiability.

factors in a convenient way, belief propagation allows the marginals to be computed much more efficiently.

Variants of the belief propagation algorithm exist for several types of graphical models (Bayesian networks and Markov random fields[5] in particular).

[6] The algorithm works by passing real valued functions called messages along the edges between the nodes.

Keeping the same notation: As shown by the previous formula: the complete marginalization is reduced to a sum of products of simpler terms than the ones appearing in the full joint distribution.

In the case where the graphical model is a tree, an optimal scheduling converges after computing each message exactly once (see next sub-section).

When the factor graph has cycles, such an optimal scheduling does not exist, and a typical choice is to update all messages simultaneously at each iteration.

Upon convergence (if convergence happened), the estimated marginal distribution of each node is proportional to the product of all messages from adjoining factors (missing the normalization constant): Likewise, the estimated joint marginal distribution of the set of variables belonging to one factor is proportional to the product of the factor and the messages from the variables: In the case where the factor graph is acyclic (i.e. is a tree or a forest), these estimated marginal actually converge to the true marginals in a finite number of iterations.

In the case when the factor graph is a tree, the belief propagation algorithm will compute the exact marginals.

Furthermore, with proper scheduling of the message updates, it will terminate after two full passes through the tree.

Although it was originally designed for acyclic graphical models, the Belief Propagation algorithm can be used in general graphs.

The algorithm is then sometimes called loopy belief propagation, because graphs typically contain cycles, or loops.

[7] Several sufficient (but not necessary) conditions for convergence of loopy belief propagation to a unique fixed point exist.

[8] There exist graphs which will fail to converge, or which will oscillate between multiple states over repeated iterations.

The basic premise is to eliminate cycles by clustering them into single nodes.

that maximizes the global function (i.e. most probable values in a probabilistic setting), and it can be defined using the arg max: An algorithm that solves this problem is nearly identical to belief propagation, with the sums replaced by maxima in the definitions.

[9] It is worth noting that inference problems like marginalization and maximization are NP-hard to solve exactly and approximately (at least for relative error) in a graphical model.

More precisely, the marginalization problem defined above is #P-complete and maximization is NP-complete.

The memory usage of belief propagation can be reduced through the use of the Island algorithm (at a small cost in time complexity).

The sum-product algorithm is related to the calculation of free energy in thermodynamics.

[10] Belief propagation algorithms are normally presented as message update equations on a factor graph, involving messages between variable nodes and their neighboring factor nodes and vice versa.

Considering messages between regions in a graph is one way of generalizing the belief propagation algorithm.

[10] There are several ways of defining the set of regions in a graph that can exchange messages.

[14] Improvements in the performance of belief propagation algorithms are also achievable by breaking the replicas symmetry in the distributions of the fields (messages).

This generalization leads to a new kind of algorithm called survey propagation (SP), which have proved to be very efficient in NP-complete problems like satisfiability[1] and graph coloring.

The first one was formulated by Weiss et al. in the year 2000, when the information matrix A is diagonally dominant.

The second convergence condition was formulated by Johnson et al.[16] in 2006, when the spectral radius of the matrix where D = diag(A).

For each case, the convergence condition involves verifying 1) a set (determined by A) being non-empty, 2) the spectral radius of a certain matrix being smaller than one, and 3) the singularity issue (when converting BP message into belief) does not occur.

[19] Additionally, the GaBP algorithm is shown to be immune to numerical problems of the preconditioned conjugate gradient method[20] The previous description of BP algorithm is called the codeword-based decoding, which calculates the approximate marginal probability