The most common situation arises when genotypes are collected at a set of polymorphic sites from a group of individuals.
For example in human genetics, genome-wide association studies collect genotypes in thousands of individuals at between 200,000-5,000,000 SNPs using microarrays.
Genotypes measure the unordered combination of alleles at each locus, whereas haplotypes represent the genetic information on multiple loci that have been inherited together from an individual's parents.
These approaches were only able to handle small numbers of sites at once, although sequential versions were later developed, specifically the SNPHAP method.
The most accurate and widely used methods for haplotype estimation utilize some form of hidden Markov model (HMM) to carry out inference.
PHASE was the first method to utilize ideas from coalescent theory concerning the joint distribution of haplotypes.
The fastPHASE [4] and BEAGLE methods [5] introduced haplotype cluster models applicable to GWAS-sized datasets.
IMPUTE2 introduced the idea of carefully choosing which subset of haplotypes to condition on to improve accuracy.