Linkage disequilibrium

What does that tell us about the expected population frequency of individuals with red hair and blue eyes?

Are all redheads expected to have blue eyes, just because the genes controlling these characters are closely linked?

(See note below about genetic nomenclature) If the A and B alleles are independent in a population, then, by definition, pAB is simply the product pApB.

The difference between these two is given the designation D, the 'coefficient of linkage disequilbrium': D = pAB - pApB Departure of D from zero indicates LD.

So, despite its widespread popular usage, its use is now avoided in genetics journals (see [3] for a discussion about the changing definition of the gene).

It was introduced for cases where there is known recombination but where the population has not come to equilibrium for the gene pair in question.

[6] But the most prominent uses of LD now involve very closely linked DNA bases (see below).

The molecular era for population genetics can be said to date from 1966[7] following the studies of Lewontin and Hubby in Drosophila[8] and Harris[9] in humans.

Using protein electrophoresis, these authors showed that around one third of loci must be 'polymorphic', having some genetic differences between individuals in the population.

Subsequent DNA sequencing, eg the International HapMap Project has shown that protein studies considerably underestimate the amount of polymorphism.

There will usually be thousands of genetic differences, titled Single Nucleotide Polymorphism or SNPs, within short regions of the genome.

[10][11][12] Studies such as those of Robbins[4] referred to above essentially assume an infinite population size.

This has had enormous importance in diverse fields of human genetics and animal breeding.

This has allowed the mapping of causal genes in human genetics, using Genome-wide association studies (GWAS).

It has allowed DNA 'breeding values' to be used as predictors, leading to advances in animal and plant breeding.

Then summing over the four classes: Σfxy = 1.g1 + 0.g2 + 0.g3 + 0.g4 = g1 Σfx = g1 + g2 = pA Σfy = g1 + g2 = pB The covariance between x and y values is Σfxy - Σfx Σfy = g1 - pA pB which is equivalent to the LD coefficient, D, as defined above.

This LD measure was introduced by Sewall Wright[14] and its use popularised by Hill and Robertson.

Furthermore, it is also possible to define linkage disequilibrium among three or more alleles, however these higher-order associations are not commonly used in practice.

This poses an issue when comparing linkage disequilibrium between alleles with differing frequencies.

by the theoretical maximum difference between the observed and expected allele frequencies as follows: where The value of

The deviation of the observed frequency of a haplotype from the expected is a quantity[4] called the linkage disequilibrium[6] and is commonly denoted by a capital D: Thus, if the loci were inherited independently, then

For example, if we aim to create an association map in a case-control study, then we may use the d method due to its asymmetry.

may be very useful would include measuring the recombination rate in an evolving population, or detecting disease associations.

[16] In the absence of evolutionary forces other than random mating, Mendelian segregation, random chromosomal assortment, and chromosomal crossover (i.e. in the absence of natural selection, inbreeding, and genetic drift), the linkage disequilibrium measure

, and as these copies are initially in the two different gametes that formed the diploid genotype, these are independent events so that the probabilities can be multiplied.

If at some time we observe linkage disequilibrium, it will disappear in the future due to recombination.

This method has the advantage of being easy to interpret, but it also cannot display information about other variables that may be of interest.

The advantage of this method is that it shows the individual genotype frequencies and includes a visual difference between absolute (where the alleles at the two loci always appear together) and complete (where alleles at the two loci show a strong connection but with the possibility of recombination) linkage disequilibrium by the shape of the graph.

[19] Another visualization option is forests of hierarchical latent class models (FHLCM).

The Ensembl project integrates HapMap data with other genetic information from dbSNP.

A heatmap showing the linkage disequilibrium between genetic loci, detected using the GAM method.