It is a GHMM-based program that can be used to predict the location of genes and their exon-intron boundaries in genomic sequences from a variety of organisms.
[5] The primary goal when developing a genomic sequence model for GENSCAN was to identify both the general and specific properties that compose the individual functional units of eukaryotic genes (e.g. exons, introns, splice sites, promoters).
[3] Due to the usage of these elements, GENSCAN works without needing to reference similar genes in protein sequence databases.
Instead, predictions produced by GENSCAN are complementary to those gathered by homology-based gene identification methods (e.g. querying protein databases with BLASTX).
A notable difference is the fact that GENSCAN utilizes a genomic sequence model that exclusively focuses double-stranded DNA where genes that are present on both strands are simultaneously analyzed.
One of which being the capability of capturing differences in gene structure and composition between C + G regions in the human genome, using sets of empirically generated model parameters.
Lastly, this also allows GENSCAN to capture dependencies between signal positions with new models of donor and acceptor splice sites.
[3] The run time for GENSCAN scales almost linearly when provided realistically sized sequences (several kilobits minimum), but has a worst case of being quadratic.
However, work still needed to be done due to how GENSCAN was shown to only predict 10-15% of genes accurately on realistic data sets.