When researchers use complicated methods to pick their sample, they use the design effect to check and adjust their results.
Generally, the design effect varies among different statistics of interests, such as the total or ratio mean.
In such cases, the level of correlation between the probability of selection for an element and its measured outcome can have a direct influence on the subsequent design effect.
In a 1995 paper,[5]: 73 Kish mentions that a similar concept, termed "Lexis ratio", was described at the end of the 19th century.
The closely related Intraclass correlation was described by Fisher in 1950, while computations of ratios of variances were already published by Kish and others from the late 1940s to the 1950s.
[6][4] In his 1995 paper, Kish proposed that considering the design effect is necessary when averaging the same measured quantity from multiple surveys conducted over a period of time.
[5]: 57–62 He also suggested that the design effect should be considered when extrapolating from the error of simple statistics (e.g. the mean) to more complex ones (e.g. regression coefficients).
For example, consider a multistage design with primary sampling units (PSUs) selected systematically with probability proportional to some measure of size from a list sorted in a particular way (say, by number of households in each PSU).
[10] A related quantity is the effective sample size ratio, which can be calculated by simply taking the inverse of
For example, in the cluster sampling case the units may have equal or unequal selection probabilities, irrespective of their intra-class correlation (and their negative effect of increasing the variance of the estimators).
We might decide (for practical reasons) to collect responses from only 2 people of each household (i.e., a sampled cluster), which could lead to more complex post-sampling adjustment to deal with unequal selection probabilities.
This might happen when making adjustments for issues like non-coverage, non-response, or an unexpected strata split of the population that wasn’t available during the initial sampling stage.
For example, when we use post-stratification based on age and gender, it is assumed that these variables can explain a significant portion of the bias in the sample.
Either way, even when estimators (like propensity score models) do a good job capturing most of the sampling design, using the weights can make a small or a large difference, depending on the specific data-set.
[11] Sometimes, these different design effects can be compounded together (as in the case of unequal selection probability and cluster sampling, more details in the following sections).
Whether or not to use these formulas, or just assume SRS, depends on the expected amount of bias reduction vs. the increase in estimator variance (and in the overhead of methodological and technical complexity).
[20][17]: 132 [21]: 1 The two primary ways to argue about the properties of calibration estimators are:[17]: 133–134 [22] As we will see later, some proofs in the literature rely on the randomization-based framework, while others focus on the model-based perspective.
It will also be required to assume the weights themselves are not a random variable but rather some known constants (e.g. the inverse of probability of selection, for some pre-determined and known sampling design).
[citation needed] The following is a simplified proof for when there are no clusters (i.e., no Intraclass correlation between element of the sample) and each stratum includes only one observation:[28]
In such a case, while the design effect formula might still be correct (if the other conditions are met), it would require a different estimator for the variance of the weighted mean.
[26]: 318 [13]: 396 Reflecting on this, Park and Lee (2006) stated that "The rationale behind [...][Kish's] derivation is that the loss in precision of [the weighted mean] due to haphazard unequal weighting can be approximated by the ratio of the variance under disproportionate stratified sampling to that under the proportionate stratified sampling".
This is since stratified sampling removes some of the variability in the specific number of elements per stratum, as occurs under SRS.
[36][28]: 105 In such cases, Kish's formula (using the average cluster weight) serves as a conservative (upper bound) of the exact design effect.
[j] A model based justification for this formula was provided by Gabler et al.[28] In 2000, Liu and Aragon proposed a decomposition of unequal selection probabilities design effect for different strata in stratified sampling.
[39] In 2002, Liu et al. extended that work to account for stratified samples, where within each stratum is a set of unequal selection probability weights.
extends the model-based justification of Kish’s 1987 formula for design effects proposed by Gabler, el.
[41] The modified formulae define the overall design effect using survey weights and population intracluster correlations.
Lohr presents conditions under which the GLS estimator of the regression slope has a design effect less than 1, indicating higher efficiency.
In contrast, the OLS estimator of the regression slope and the design effect calculated from a design-based perspective are robust to misspecification of the variance structure, making them more reliable in situations where the model specification may not be accurate.
For instance, Taylor linearization is utilized to construct confidence intervals based on the variance of the weighted mean.