Dependency networks (DNs) are graphical models, similar to Markov networks, wherein each vertex (node) corresponds to a random variable and each edge captures dependencies among variables.
Unlike Bayesian networks, DNs may contain cycles.
Each node is associated to a conditional probability table, which determines the realization of the random variable given its parents.
[1] In a Bayesian network, the Markov blanket of a node is the set of parents and children of that node, together with the children's parents.
However, its children's parents also have to be included in the Markov blanket, because they can be used to explain away the node in question.
In a dependency network, the Markov blanket for a node is simply the set of its parents.
In particular, they are easier to parameterize from data, as there are efficient algorithms for learning both the structure and probabilities of a dependency network from data.
Such algorithms are not available for Bayesian networks, for which the problem of determining the optimal structure is NP-hard.
[2] Nonetheless, a dependency network may be more difficult to construct using a knowledge-based approach driven by expert-knowledge.
Nonetheless, it is possible to construct non-consistent dependency networks, i.e., dependency networks for which there is no compatible valid joint probability distribution.
A consistent dependency network for a set of random variables
is a cyclic directed graph, where each of its nodes corresponds to a variable in
that satisfy the following independence relationships The dependency network is consistent in the sense that each local distribution can be obtained from the joint distribution
In that case, there is no joint probability distribution that satisfies the independence relationships subsumed by that pair.
Two important tasks in a dependency network are to learn its structure and probabilities from data.
Essentially, the learning algorithm consists of independently performing a probabilistic regression or classification for each variable in the domain.
, which can be estimated by any number of classification or regression techniques, such as methods using a probabilistic decision tree, a neural network or a probabilistic support-vector machine.
, we independently estimate its local distribution from data using a classification algorithm, even though it is a distinct method for each variable.
Here, we will briefly show how probabilistic decision trees are used to estimate the local distributions.
, the search algorithm begins with a singleton root node without children.
Then, each leaf node in the tree is replaced with a binary split on some variable
One of the alternatives for performing probabilistic inference is using Gibbs sampling.
A naive approach for this uses an ordered Gibbs sampler, an important difficulty of which is that if either
is small, then many iterations are required for an accurate probability estimate.
is small is to use modified ordered Gibbs sampler, where
So, the law of total probability along with the independencies encoded in a dependency network can be used to decompose the inference task into a set of inference tasks on single variables.
This approach comes with the advantage that some terms may be obtained by direct lookup, thereby avoiding some Gibbs sampling.
In addition to the applications to probabilistic inference, the following applications are in the category of Collaborative Filtering (CF), which is the task of predicting preferences.
Dependency networks are a natural model class on which to base CF predictions, once an algorithm for this task only needs estimation of
In particular, these estimates may be obtained by a direct lookup in a dependency network.