Qualitative variation

An index of qualitative variation (IQV) is a measure of statistical dispersion in nominal distributions.

Indices of qualitative variation are then analogous to information entropy, which is minimized when all cases belong to a single category and maximized in a uniform distribution.

When K is large, ModVR is approximately equal to Freeman's index v. This is based on the range around the mode.

The second is based on MNDif This index was originally developed by Claude Shannon for use in specifying the properties of communication channels.

Wilcox adapted a proposal of Kaiser[6] based on the geometric mean and created the B' index.

M1 and M2 can be interpreted in terms of variance of a multinomial distribution (Swanson 1976) (there called an "expanded binomial model").

The M1 statistic defined above has been proposed several times in a number of different settings under a variety of names.

Simpson's D is defined as where n is the total sample size and ni is the number of items in the ith category.

Greenberg's monolingual non weighted index of linguistic diversity[21] is the M2 statistic defined above.

Note: This index was designed to measure women's participation in the work place: the two subtypes it was developed for were male and female.

The Berger–Parker index, named after Wolfgang H. Berger and Frances Lawrence Parker, equals the maximum

This is taken from information theory where N is the total number in the sample and pi is the proportion in the ith category.

An approximate formula for the standard deviation (SD) of H is where pi is the proportion made up by the ith category and N is the total in the sample.

A more accurate approximate value of the variance of H(var(H)) is given by[31] where N is the sample size and K is the number of categories.

While X is best estimated numerically an approximate value can be obtained by solving the following two equations where K is the number of categories and N is the total sample size.

The variance of α is approximately[34] This index (Dw) is the distance between the Lorenz curve of species distribution and the 45 degree line.

[39] It is defined as where In these equations xij and xkj are the number of times the jth data type appears in the ith or kth sample respectively.

The standard deviation is estimated from the formula derived by Pielou where pi is the proportion made up by the ith category and N is the total in the sample.

Several of these indexes have been developed to document the degree to which different data types of interest may coexist within a geographic area.

Masaaki Morisita's index of dispersion ( Im ) is the scaled probability that two points chosen at random from the whole population are in the same sample.

[68] where I is an index of diversity, Imax and Imin are the maximum and minimum values of I between the samples being compared.

Loevinger has suggested a coefficient H defined as follows: where pmax and pmin are the maximum and minimum proportions in the sample.

A modified version of the Manhattan distance can be used to find a zero (root) of a polynomial of any degree using Lill's method.

Its statistical properties were examined by Sanchez et al.[74] who recommended a bootstrap procedure to estimate confidence intervals when testing for differences between samples.

The potential-for-conflict Index (PCI) describes the ratio of scoring on either side of a rating scale's centre point.

The PCI can be computed only for scales with a neutral center point and an equal number of response options on either side of it.

For example, five-, seven- and nine-point scales with a uniform distribution of responses give PCIs of 0.60, 0.57 and 0.50 respectively.

Vaske et al. suggest the use of a t test to compare the values of the PCI between samples if the PCIs are approximately normally distributed.

If there are n units in the sample and they are randomly distributed into k categories (n ≤ k), this can be considered a variant of the birthday problem.

In some cases it is useful to not standardize an index to run from 0 to 1, regardless of number of categories or samples (Wilcox 1973, pp.