Chain rule (probability)

In probability theory, the chain rule[1] (also called the general product rule[2][3]) describes how to calculate the probability of the intersection of, not necessarily independent, events or the joint distribution of random variables respectively, using conditional probabilities.

This rule allows one to express a joint probability in terms of only conditional probabilities.

[4] The rule is notably used in the context of discrete stochastic processes and in applications, e.g. the study of Bayesian networks, which describe a probability distribution in terms of conditional probabilities.

, the chain rule states that where

denotes the conditional probability of

An Urn A has 1 black ball and 2 white balls and another Urn B has 1 black ball and 3 white balls.

Suppose we pick an urn at random and then select a ball from that urn.

be choosing the first urn, i.e.

is the complementary event of

be the chance we choose a white ball.

The chance of choosing a white ball, given that we have chosen the first urn, is

then describes choosing the first urn and a white ball from it.

The probability can be calculated by the chain rule as follows: For events

whose intersection has not probability zero, the chain rule states For

, i.e. four events, the chain rule reads We randomly draw 4 cards (one at a time) without replacement from deck with 52 cards.

What is the probability that we have picked 4 aces?

Obviously, we get the following probabilities Applying the chain rule, Let

be a probability space.

Recall that the conditional probability of an

be a probability space.

Then The formula follows immediately by recursion where we used the definition of the conditional probability in the first step.

For two discrete random variables

in the definition above, and find the joint distribution as or where

is the probability distribution of

conditional probability distribution of

be random variables and

By the definition of the conditional probability, and using the chain rule, where we set

, we can find the joint distribution as For

, i.e. considering three random variables.

Then, the chain rule reads