Cluster sampling

If a simple random subsample of elements is selected within each of these groups, this is referred to as a "two-stage" cluster sampling plan.

A common motivation for cluster sampling is to reduce the total number of interviews and costs given the desired accuracy.

Relying on the sample drawn from these options will yield an unbiased estimator.

This leads to a more complicated formula for the standard error of the estimator, as well as issues with the optics of the study plan (since the power analysis and the cost estimations often relate to a specific sample size).

The advantage here is that when clusters are selected with probability proportionate to size, the same number of interviews should be carried out in each sampled cluster so that each unit sampled has the same probability of selection.

Because a geographically dispersed population can be expensive to survey, greater economy than simple random sampling can be achieved by grouping several respondents within a local area into a cluster.

Enumeration areas may be also useful as first-stage units for cluster sampling in many types of surveys.

When a population census is outdated, the list of individuals should not be directly used as sampling frame for a socio-economic survey.

[1] Cluster sampling is used to estimate low mortalities in cases such as wars, famines and natural disasters.

In commercial fisheries sampling, the costs of operating at sea are often too large to select hauls individually and at random.

The World Bank has applied adaptive cluster sampling to study informal businesses in developing countries in a cost efficient manner, as the informal sector is not captured by official records and too expensive to be studied through simple random sampling.

Two-stage cluster sampling aims at minimizing survey costs and at the same time controlling the uncertainty related to estimates of interest.

For instance, it can be necessary to cluster at the state or city-level, units that may be small and fixed in number.

If the number of clusters is low the estimated covariance matrix can be downward biased.

Therefore, a high number means a strong downward bias of the estimated covariance matrix.

One can use a bias-corrected cluster-robust variance matrix, make T-distribution adjustments, or use bootstrap methods with asymptotic refinements, such as the percentile-t or wild bootstrap, that can lead to improved finite sample inference.

[10] Cameron, Gelbach and Miller (2008) provide microsimulations for different methods and find that the wild bootstrap performs well in the face of a small number of clusters.

Cluster sampling. A group of twelve people are divided into pairs, and two pairs are then selected at random.