GeneMark

GeneMark is a generic name for a family of ab initio gene prediction algorithms and software programs developed at the Georgia Institute of Technology in Atlanta.

The next important step in the algorithm development was introduction of self-training or unsupervised training of the model parameters in the new gene prediction tool GeneMarkS (2001).

The new algorithm, GeneMarkS-2 was designed to make automatic adjustments to the types of gene expression patterns and the GC content changes along the genomic sequence.

A surprisingly accurate answer was found by introduction of parameter generating functions depending on a single variable, the sequence G+C content ("heurisic method" 1999).

Subsequently, analysis of several hundred prokaryotic genomes led to developing more advanced heuristic method in 2010 (implemented in MetaGeneMark).

Initial version of the eukaryotic GeneMark.hmm needed manual compilation of training sets of protein-coding sequences for estimation of the algorithm parameters.

Versatility and accuracy of the eukaryotic gene finders of the GeneMark family have led to their incorporation into number of pipelines of genome annotation.