UniGene

Information on protein similarities, gene expression, cDNA clones, and genomic location is included with each entry.

In most cases, each cluster is made up of sequences produced by a single gene, including alternatively spliced transcripts.

At this stage, all clusters are ‘‘anchored,’’ and contain either a sequence with a polyadenylation site or two ESTs labeled as coming from the 3 end of a clone.

Conversely, it appears that the majority of human genes have been identified only by ESTs; only 16% of clusters contain either an mRNA or a CDS annotated on a genomic DNA.

Because fewer ESTs are available for mouse, rat, and zebrafish, the UniGene clusters are not as representative of the unique genes in the genome.

A new UniGene resource, HomoloGene, includes curated and calculated orthologs and homologs for genes from human, mouse, rat, and zebrafish.

Calculated orthologs and homologs are the result of nucleotide sequence comparisons between all UniGene clusters for each pair of organisms.

A special symbol indicates that UniGene clusters in three or more organisms share a mutually consistent ortholog relationship.

Cluster identifiers are prefixed with Hs for Homo sapiens, Rn for Rattus norvegicus, Mm for Mus musculus, or Dn for Danio rerio.

It is important to note that clusters that contain ESTs only (i.e., no mRNAs or annotated CDSs) will be missing some of these fields, such as LocusLink, OMIM, and mRNA/Gene links.

[1] On February 1, 2019, the NCBI announced that it was retiring the UniGene database because "reference genomes are available for most organisms with a sizable research community.