Rubin causal model

The Rubin causal model (RCM), also known as the Neyman–Rubin causal model,[1] is an approach to the statistical analysis of cause and effect based on the framework of potential outcomes, named after Donald Rubin.

[2] The potential outcomes framework was first proposed by Jerzy Neyman in his 1923 Master's thesis,[3] though he discussed it only in the context of completely randomized experiments.

[4] Rubin extended it into a general framework for thinking about causation in both observational and experimental studies.

To measure the causal effect of going to college for this person, we need to compare the outcome for the same individual in both alternative futures.

[2] It is important to note that this uncertainty can also be deduced using the concept of universal probability (any outcome can be generated randomly).

Thus, according to this point of view, the experimental impossibility of associating, with certainty, a hypothesis to a causality is a direct consequence of universal probability.

An estimate of the average causal effect (also referred to as the average treatment effect or ATE) can then be obtained by computing the difference in means between the treated (college-attending) and control (not-college-attending) samples.

Rather, people may choose to attend college based on their financial situation, parents' education, and so on.

Many statistical methods have been developed for causal inference, such as propensity score matching.

Our definition of the causal effect of the E versus C treatment will reflect this intuitive meaning.

In most circumstances, we are interested in comparing two futures, one generally termed "treatment" and the other "control".

In general, this notation expresses the potential outcome which results from a treatment, t, on a unit, u.

can never be observed, even in theory, then the causal effect of treatment on Joe's blood pressure is not defined.

The causal effect of new drug is well defined because it is the simple difference of two potential outcomes, both of which might happen.

This definition of causal effects becomes much more problematic if there is no way for one of the potential outcomes to happen, ever.

We just need to compare two potential outcomes: what would Joe's weight be under the treatment (where treatment is defined as being 3 inches taller) and what would Joe's weight be under the control (where control is defined as his current height).

A moment's reflection highlights the problem: we can't increase Joe's height.

This is called the stable unit treatment value assumption (SUTVA), which goes beyond the concept of independence.

In the context of our example, Joe's blood pressure should not depend on whether or not Mary receives the drug.

One is the causal effect of the drug on Joe when Mary receives treatment and is calculated,

In order to (easily) estimate the causal effect of a single treatment relative to a control, SUTVA should hold.

Then, depending in the exact numbers, the average causal effect might be an increase in blood pressure.

The absolute size of the causal effect is −14, but the percentage difference (in terms of the treatment level of 140) is −10%.

It is impossible, by definition, to observe the effect of more than one treatment on a subject over a specific time period.

After assigning treatments randomly, we might estimate the causal effect as: A different random assignment of treatments yields a different estimate of the average causal effect.

The average causal effect varies because our sample is small and the responses have a large variance.

The perfect doctor knows this information about a sample of patients: Based on this knowledge she would make the following treatment assignments: The perfect doctor distorts both averages by filtering out poor responses to both the treatment and control.

The difference between means, which is the supposed average causal effect, is distorted in a direction that depends on the details.

For more on the connections between the Rubin causal model, structural equation modeling, and other statistical methods for causal inference, see Morgan and Winship (2007),[7] Pearl (2000),[8] Peters et al. (2017),[9] and Ibeling & Icard (2023).

[10] Pearl (2000) argues that all potential outcomes can be derived from Structural Equation Models (SEMs) thus unifying econometrics and modern causal analysis.