Weighted arithmetic mean

The weighted arithmetic mean is similar to an ordinary arithmetic mean (the most common type of average), except that instead of each of the data points contributing equally to the final average, some data points contribute more than others.

The notion of weighted mean plays a role in descriptive statistics and also occurs in a more general form in several other areas of mathematics.

If the data elements are independent and identically distributed random variables with variance

, can be shown via uncertainty propagation to be: For the weighted mean of a list of data for which each element

From a model based perspective, we are interested in estimating the variance of the weighted mean when the different

An alternative perspective for this problem is that of some arbitrary sampling design of the data in which units are selected with unequal probabilities (with replacement).

The survey sampling procedure yields a series of Bernoulli indicator values (

This will be the estimand for specific values of y and w, but the statistical properties comes when including the indicator variable

Since there is no closed analytical form to compute this variance, various methods are used for approximate estimation.

[2]: 172 The Taylor linearization method could lead to under-estimation of the variance for small sample sizes in general, but that depends on the complexity of the statistic.

For the weighted mean, the approximate variance is supposed to be relatively accurate even for medium sample sizes.

It also means that if we scale the sum of weights to be equal to a known-from-before population size N, the variance calculation would look the same.

When all weights are equal to one another, this formula is reduced to the standard unbiased variance estimator.

This helps illustrate that this formula incorporates the effect of correlation between y and z on the variance of the ratio estimators.

A similar re-creation of the proof (up to some mistakes at the end) was provided by Thomas Lumley in crossvalidated.

[3] We have (at least) two versions of variance for the weighted mean: one with known and one with unknown population size estimation.

It has been shown, by Gatz et al. (1995), that in comparison to bootstrapping methods, the following (variance estimation of ratio-mean using Taylor series linearization) is a reasonable estimation for the square of the standard error of the mean (when used in the context of measuring chemical constituents):[4]: 1186 where

Further simplification leads to Gatz et al. mention that the above formulation was published by Endlich et al. (1988) when treating the weighted mean as a combination of a weighted total estimator divided by an estimator of the population size,[5] based on the formulation published by Cochran (1977), as an approximation to the ratio mean.

However, Endlich et al. didn't seem to publish this derivation in their paper (even though they mention they used it), and Cochran's book includes a slightly different formulation.

Because there is no closed analytical form for the variance of the weighted mean, it was proposed in the literature to rely on replication methods such as the Jackknife and Bootstrapping.

, the variance of the weighted sample mean is[citation needed] whose square root

Its minimum value is found when all weights are equal (i.e., unweighted mean), in which case we have

In any case, the information on total number of samples is necessary in order to obtain an unbiased correction, even if

The estimator can be unbiased only if the weights are not standardized nor normalized, these processes changing the data's mean and variance and thus leading to a loss of the base rate (the population count, which is a requirement for Bessel's correction).

If the weights are instead reliability weights (non-random values reflecting the sample's relative trustworthiness, often derived from sample variance), we can determine a correction factor to yield an unbiased estimator.

, with Bessel's correction, is given by:[8] This estimator can be unbiased only if the weights are not standardized nor normalized, these processes changing the data's mean and variance and thus leading to a loss of the base rate (the population count, which is a requirement for Bessel's correction).

The Gauss–Markov theorem states that the estimate of the mean having minimum variance is given by: and where: Consider the time series of an independent variable

Commonly, the strength of this dependence decreases as the separation of observations in time increases.

In the scenario described in the previous section, most frequently the decrease in interaction strength obeys a negative exponential law.

If this cannot be determined from theoretical considerations, then the following properties of exponentially decreasing weights are useful in making a suitable choice: at step