Ratio estimator

Ratio estimates are biased and corrections must be made when they are used in experimental or survey work.

Assume there are two characteristics – x and y – that can be observed for each sampled element in the data set.

An upper bound on the relative bias of the estimate is provided by the coefficient of variation (the ratio of the standard deviation to the mean).

A correction of the bias accurate to the first order is[citation needed] where mx is the mean of the variate x and sxy is the covariance between x and y.

To simplify the notation sxy will be used subsequently to denote the covariance between the variates x and y.

Another estimator based on the Taylor expansion is[3] where n is the sample size, N is the population size, mx is the mean of the x variate and sx2 and sy2 are the sample variances of the x and y variates respectively.

A computationally simpler but slightly less accurate version of this estimator is where N is the population size, n is the sample size, mx is the mean of the x variate and sx2 and sy2 are the sample variances of the x and y variates respectively.

[10] An alternative method is to divide the sample into g groups each of size p with n = pg.

For normally distributed x and y variates the skewness of the ratio is approximately[7] where Because the ratio estimate is generally skewed confidence intervals created with the variance and symmetrical tests such as the t test are incorrect.

Note that while many applications such as those discussion in Lohr[13] are intended to be restricted to positive integers only, such as sizes of sample groups, the Midzuno-Sen method works for any sequence of positive numbers, integral or not.

It's not clear what it means that Lahiri's method works since it returns a biased result.

Lahiri's scheme as described by Lohr is biased high and, so, is interesting only for historical reasons.

In 1952 Midzuno and Sen independently described a sampling scheme that provides an unbiased estimator of the ratio.

The probability of selection under this scheme is where X is the sum of the N x variates and the xi are the n members of the sample.

These ratio estimators are commonly used to calculate pollutant loads from sampling of waterways, particularly where flow is measured more frequently than water quality.

If a linear relationship between the x and y variates exists and the regression equation passes through the origin then the estimated variance of the regression equation is always less than that of the ratio estimator[citation needed].

Later Messance (~1765) and Moheau (1778) published very carefully prepared estimates for France based on enumeration of population in certain districts and on the count of births, deaths and marriages as reported for the whole country.

The districts from which the ratio of inhabitants to birth was determined only constituted a sample.

No population census had been carried out and Laplace lacked the resources to count every individual.

The total number of baptismal registrations for France was also available to him and he assumed that the ratio of live births to population was constant.