Some non-coding regions appear to be mostly nonfunctional, such as introns, pseudogenes, intergenic DNA, and fragments of transposons and viruses.
Genome size in eukaryotes can vary over a wide range, even between closely related species.
This puzzling observation was originally known as the C-value Paradox where "C" refers to the haploid genome size.
[4] The paradox was resolved with the discovery that most of the differences were due to the expansion and contraction of repetitive DNA and not the number of genes.
The reduced size of the pufferfish genome is due to a reduction in the length of introns and less repetitive DNA.
[10][11] The remainder of the genome (70% non-coding DNA) consists of promoters and regulatory sequences that are shorter than those in other plant species.
[11] Much of the repetitive DNA seen in other eukaryotes has been deleted from the bladderwort genome since that lineage split from those of other plants.
[11] The authors of the original 2013 article note that claims of additional functional elements in the non-coding DNA of animals do not seem to apply to plant genomes.
[10] According to a New York Times article, during the evolution of this species, "... genetic junk that didn't serve a purpose was expunged, and the necessary stuff was kept.
"[12] According to Victor Albert of the University of Buffalo, the plant is able to expunge its so-called junk DNA and "have a perfectly good multicellular plant with lots of different cells, organs, tissue types and flowers, and you can do it without the junk.
[citation needed] The total number of noncoding genes in the human genome is controversial.
Regulatory elements were discovered in the 1960s and their general characteristics were worked out in the 1970s by studying specific transcription factors in bacteria and bacteriophage.
The exact amount of regulatory DNA in mammalian genome is unclear because it is difficult to distinguish between spurious transcription factor binding sites and those that are functional.
[citation needed] Many regulatory sequences occur near promoters, usually upstream of the transcription start site of the gene.
Spliceosomal introns (see Figure) are only found in eukaryotes and they can represent a substantial proportion of the genome.
Combining that with about 1% coding sequences means that protein-coding genes occupy about 38% of the human genome.
[21][2] The standard biochemistry and molecular biology textbooks describe non-coding nucleotides in mRNA located between the 5' end of the gene and the translation initiation codon.
The main features of replication origins are sequences where specific initiation proteins are bound.
Pseudogenes are only a small fraction of noncoding DNA in prokaryotic genomes because they are eliminated by negative selection.
In some eukaryotes, however, pseudogenes can accumulate because selection is not powerful enough to eliminate them (see Nearly neutral theory of molecular evolution).
Retrotransposon repeated sequences, which include long interspersed nuclear elements (LINEs) and short interspersed nuclear elements (SINEs), account for a large proportion of the genomic sequences in many species.
[39] Over 8% of the human genome is made up of (mostly decayed) endogenous retrovirus sequences, as part of the over 42% fraction that is recognizably derived of retrotransposons, while another 3% can be identified to be the remains of DNA transposons.
Highly repetitive DNA is rare in prokaryotes but common in eukaryotes, especially those with large genomes.
[47] The exact amount of nonfunctional DNA in humans and other species with large genomes has not been determined and there is considerable controversy in the scientific literature.
Genome-wide association studies (GWAS) identify linkages between alleles and observable traits such as phenotypes and diseases.
The association establishes a linkage that helps map the DNA region responsible for the trait but it does not necessarily identify the mutations causing the disease or phenotypic difference.