Conditional independence

In probability theory, conditional independence describes situations wherein an observation is irrelevant or redundant when evaluating the certainty of a hypothesis.

are observations, conditional independence can be stated as an equality: where

The concept of conditional independence is essential to graph-based theories of statistical inference, as it establishes a mathematical relation between a collection of conditional statements and a graphoid.

are represented by the areas shaded red, blue and yellow respectively.

because: Let events A and B be defined as the probability that person A and person B will be home in time for dinner where both people are randomly sampled from the entire world.

However, if a third event is introduced, person A and person B live in the same neighborhood, the two events are now considered not conditionally independent.

[2] Conditional independence depends on the nature of the third event.

But knowing that two people are 19 years old (i.e., conditional on age) there is no reason to think that one person's vocabulary is larger if we are told that they are taller.

are conditionally independent given a third discrete random variable

is the conditional cumulative distribution function of

denotes the conditional expectation of the indicator function of the event

if they are independent given σ(W): the σ-algebra generated by

assumes a countable set of values, this is equivalent to the conditional independence of X and Y for the events of the form

Conditional independence of more than two events, or of more than two random variables, is defined analogously.

are again independent, but this time they take the value 1 with probability 0.99.

take the values "brainy" and "sporty".

Let p be the proportion of voters who will vote "yes" in an upcoming referendum.

In taking an opinion poll, one chooses n voters randomly from the population.

For i = 1, ..., n, let Xi = 1 or 0 corresponding, respectively, to whether or not the ith chosen voter will or will not vote "yes".

In a frequentist approach to statistical inference one would not attribute any probability distribution to p (unless the probabilities could be somehow interpreted as relative frequencies of occurrence of some event or as proportions of some population) and one would say that X1, ..., Xn are independent random variables.

By contrast, in a Bayesian approach to statistical inference, one would assign a probability distribution to p regardless of the non-existence of any such "frequency" interpretation, and one would construe the probabilities as degrees of belief that p is in any interval to which a probability is assigned.

In that model, the random variables X1, ..., Xn are not independent, but they are conditionally independent given the value of p. In particular, if a large number of the Xs are observed to be equal to 1, that would imply a high conditional probability, given that observation, that p is near 1, and thus a high conditional probability, given that observation, that the next X to be observed will be equal to 1.

A set of rules governing statements of conditional independence have been derived from the basic definition.

[4][5] These rules were termed "Graphoid Axioms" by Pearl and Paz,[6] because they hold in graphs, where

is interpreted to mean: "All paths from X to A are intercepted by the set B".

[7] Proof: Note that we are required to prove if

Proof The second condition can be proved similarly.

Proof This property can be proved by noticing

For strictly positive probability distributions,[5] the following also holds: Proof By assumption: Using this equality, together with the Law of total probability applied to

Technical note: since these implications hold for any probability space, they will still hold if one considers a sub-universe by conditioning everything on another variable, say K. For example,