GLIMMER

[1] GLIMMER was the first system that used the interpolated Markov model[2] to identify coding regions.

The GLIMMER software is open source and is maintained by Steven Salzberg, Art Delcher, and their colleagues at the Center for Computational Biology[3] at Johns Hopkins University.

The original GLIMMER algorithms and software were designed by Art Delcher, Simon Kasif and Steven Salzberg and applied to bacterial genome annotation in collaboration with Owen White.

[4] This paper[4] provides significant technical improvements such as using interpolated context model instead of interpolated Markov model and resolving overlapping genes which improves the accuracy of GLIMMER.

[5] This paper describes several major changes made to the GLIMMER system including improved methods to identify coding regions and start codon.

Reverse scanning helps in identifying the coding portion of the gene more accurately which is contained in the context window of IMM.

GLIMMER 3.0 also improves the generated training set data by comparing the long-ORF with universal amino acid distribution of widely disparate bacterial genomes.

GLIMMER 3.0 uses a new algorithm for scanning coding regions, a new start site detection module, and architecture which integrates all gene predictions across an entire genome.

"[5] Minimum description length The GLIMMER project helped introduce and popularize the use of variable length models in Computational Biology and Bioinformatics that subsequently have been applied to numerous problems such as protein classification and others.

Prediction and compression are intimately linked using Minimum Description Length Principles.

The basic idea is to create a dictionary of frequent words (motifs in biological sequences).

Similarly to the development of HMMs in Computational Biology, the authors of GLIMMER were conceptually influenced by the previous application of another variant of interpolated Markov models to speech recognition by researchers such as Fred Jelinek (IBM) and Eric Ristad (Princeton).

First program called build-imm, which takes an input set of sequences and outputs the interpolated Markov model as follows.

is the estimate obtained from the training data of the probability of the base located at position x in the

can be regarded as a measure of confidence in the accuracy of this value as an estimate of the true probability.

When there are insufficient sample occurrences of a context string, build-imm employ additional criteria to determine

with the previously calculated interpolated Markov model probabilities using the next shorter context,

test, build-imm determine how likely it is that the four observed frequencies are consistent with the IMM values from the next shorter context.

"[1] The second program called glimmer, then uses this IMM to identify putative gene in an entire genome.

GLIMMER identifies all the open reading frame which score higher than threshold and check for overlapping genes.

If A is significantly longer than B, then B is rejected or else both A and B are called genes, with a doubtful overlap.

GLIMMER results are passed as an input for RBSfinder program to predict ribosome binding sites.

Gibbs sampling algorithm is used to identify shared motif in any set of sequences.

ELPH then computes the position weight matrix(PWM) which will be used by GLIMMER 3 to score any potential RBS found by RBSfinder.

This process can be repeated for many iterations to obtain more consistent PWM and gene prediction results.

Glimmer supports genome annotation efforts on a wide range of bacterial, archaeal, and viral species.

In a large-scale reannotation effort at the DNA Data Bank of Japan (DDBJ, which mirrors Genbank).

(They also reported that 33% of genomes used "other" programs, which in many cases meant that they could not identify the method.

Excluding those cases, Glimmer was used for 73% of the genomes for which the methods could be unambiguously identified.)

Glimmer was used by the DDBJ to re-annotate all bacterial genomes in the International Nucleotide Sequence Databases.