In statistics and econometrics, set identification (or partial identification) extends the concept of identifiability (or "point identification") in statistical models to environments where the model and the distribution of observable variables are not sufficient to determine a unique value for the model parameters, but instead constrain the parameters to lie in a strict subset of the parameter space.
Statistical models that are set (or partially) identified arise in a variety of settings in economics, including game theory and the Rubin causal model.
Unlike approaches that deliver point-identification of the model parameters, methods from the literature on partial identification are used to obtain set estimates that are valid under weaker modelling assumptions.
[1] Early works containing the main ideas of set identification included Frisch (1934) and Marschak & Andrews (1944).
However, the methods were significantly developed and promoted by Charles Manski, beginning with Manski (1989) and Manski (1990).
Partial identification continues to be a major theme in research in econometrics.
Powell (2017) named partial identification as an example of theoretical progress in the econometrics literature, and Bonhomme & Shaikh (2017) list partial identification as “one of the most prominent recent themes in econometrics.” Let
denote a vector of latent variables, let
denote a vector of observed (possibly endogenous) explanatory variables, and let
denote a vector of observed endogenous outcome variables.
represents a collection of conditional distributions, and
of the random vectors
A model is a collection of admissible (i.e. possible) structures
denote the collection of conditional distributions of
denotes the true (i.e. data-generating) structure.
More generally, the model is said to be set (or partially) identified if there exists at least one admissible
The identified set of structures is the collection of admissible structures that are observationally equivalent to
[4] In most cases the definition can be substantially simplified.
and has a known (up to some finite-dimensional parameter) distribution, and when
is known up to some finite-dimensional vector of parameters, each structure
can be characterized by a finite-dimensional parameter vector
denotes the true (i.e. data-generating) vector of parameters, then the identified set, often denoted as
, is the set of parameter values that are observationally equivalent to
Suppose there are two binary random variables, Y and Z.
There is a missing data problem, however: Y can only be observed if
By the law of total probability, The only unknown object is
Therefore, the identified set is Given the missing data constraint, the econometrician can only say that
Set estimation cannot rely on the usual tools for statistical inference developed for point estimation.
A literature in statistics and econometrics studies methods for statistical inference in the context of set-identified models, focusing on constructing confidence intervals or confidence regions with appropriate properties.
For example, a method developed by Chernozhukov, Hong & Tamer (2007) constructs confidence regions that cover the identified set with a given probability.