De novo gene birth

[38] Since this time, a plethora of genome-level studies have identified large numbers of orphan genes in many organisms, although the extent to which they arose de novo, and the degree to which they can be deemed functional, remain debated.

Genomic phylostratigraphy involves examining each gene in a focal, or reference, species and inferring the presence or absence of ancestral homologs through the use of the BLAST sequence alignment algorithms[41] or related tools.

[45][46] However, a reanalysis of studies that used phylostratigraphy in yeast, fruit flies and humans found that even when accounting for such error rates and excluding difficult-to-stratify genes from the analyses, the qualitative conclusions were unaffected.

[48][40] Confirmation that the syntenic region lacks coding potential in outgroup species allows a de novo origin to be asserted with higher confidence.

[49] There are also difficulties associated with applying synteny-based approaches to genome assemblies that are fragmented[50] or in lineages with high rates of chromosomal rearrangements, as is common in insects.

Even when the evolutionary origin of a particular coding sequence has been established, there is still a lack of consensus about what constitutes a genuine de novo gene birth event.

[61] Other experimental approaches, including screens for protein-protein and/or genetic interactions, may also be employed to confirm a biological effect for a particular de novo ORF.

Given that young, species-specific de novo genes lack deep conservation by definition, detecting statistically significant deviations from 1 can be difficult without an unrealistically large number of sequenced strains/populations.

Studies may identify de novo genes by phylostratigraphy/BLAST-based methods alone, or may employ a combination of computational techniques, and may or may not assess experimental evidence for expression and/or biological role.

Similarly, an analysis of 195 young (<35 million years old) D. melanogaster genes identified from syntenic alignments found that only 16 had arisen de novo.

[54] In contrast, an analysis focused on transcriptomic data from the testes of six D. melanogaster strains identified 106 fixed and 142 segregating de novo genes.

A newer study found that up to 39 % of orphan genes in the Drosophila clade may have emerged de novo, as they overlap with non-coding regions of the genome.

[68] In primates, one early study identified 270 orphan genes (unique to humans, chimpanzees, and macaques), of which 15 were thought to have originated de novo.

[74] Similarly, an analysis of five mammalian transcriptomes found that most ORFs in mice were either very old or species specific, implying frequent birth and death of de novo transcripts.

[76] In addition to the birth and death of de novo genes at the level of the ORF, mutational and other processes also subject genomes to constant “transcriptional turnover”.

Theoretical modeling has shown that such differences are the product both of selection for features that increase the likelihood of functionalization, and of neutral evolutionary forces that influence allelic turnover.

[105] Experiments in E. coli showed that random peptides tended to have more benign effects when they were enriched for amino acids that were small, and that promoted intrinsic structural disorder.

[110] Beyond the very youngest orphans, this study found that ISD tends to decrease with increasing gene age, and that this is primarily due to amino acid composition rather than GC content.

[95] De novo proteins typically exhibit less well-defined secondary and three-dimensional structures, often lacking rigid folding but having extensive disordered regions.

[124] Epigenetics are also largely responsible for the permissive transcriptional environment in the testes, particularly through the incorporation into nucleosomes of non-canonical histone variants that are replaced by histone-like protamines during spermatogenesis.

[125] Analysis of the fold potential diversity shows that the majority of the amino acid sequences encoded by the intergenic ORFs of S. cerevisiae are predicted to be foldable.

[83] Furthermore, putatively non-genic ORFs long enough to encode functional peptides are numerous in eukaryotic genomes, and expected to occur at high frequency by chance.

[55] A recent study on twelve Drosophila species additionally identified a higher proportion of de novo genes with testis-biased expression compared to annotated proteome.

[135] In humans, a study that identified 60 human-specific de novo genes found that their average expression, as measured by RNA-seq, was highest in the testes.

[139] Along with the immune-privileged nature of the testes, this promiscuous transcription is thought to create the ideal conditions for the expression of non-genic sequences required for de novo gene birth.

[140] De novo gene birth is thought to be favored in populations that evolve local solutions, as the relatively high error rate will result in a pool of cryptic variation that is “preadapted” through the purging of deleterious sequences.

[143] With respect to other predicted structural features such as β-strand content and aggregation propensity, the peptides encoded by proto-genes are similar to non-genic sequences and categorically distinct from canonical genes.

One such example is FLJ33706, a de novo gene that was identified in GWAS and linkage analyses for nicotine addiction and shows elevated expression in the brains of Alzheimer's patients.

Many of these young genes show signatures of positive selection, and functional annotations indicate that they are involved in diverse molecular processes, but are enriched for transcription factors.

[163] In addition to their roles in cancer processes, de novo originated human genes have been implicated in the maintenance of pluripotency[164] and in immune function.

Novel genes can emerge from ancestrally non-genic regions through poorly understood mechanisms. (A) A non-genic region first gains transcription and an open reading frame (ORF), in either order, facilitating the birth of a de novo gene. The ORF is for illustrative purposes only, as de novo genes may also , or lack an ORF, as with RNA genes . (B) Overprinting. A novel ORF is created that overlaps with an existing ORF, but in a different frame. (C) Exonization. A formerly intronic region becomes alternatively spliced as an exon, such as when repetitive sequences are acquired through retroposition and new splice sites are created through mutational processes. Overprinting and exonization may be considered as special cases of de novo gene birth.
Novel genes can be formed from ancestral genes through a variety of mechanisms. [ 1 ] (A) Duplication and divergence. Following duplication, one copy experiences relaxed selection and gradually acquires novel function(s). (B) Gene fusion. A hybrid gene formed from some or all of two previously separate genes. Gene fusions can occur by different mechanisms; shown here is an interstitial deletion. (C) Gene fission. A single gene separates to form two distinct genes, such as by duplication and differential degeneration of the two copies. [ 2 ] (D) Horizontal gene transfer . Genes acquired from other species by horizontal transfer undergo divergence and neofunctionalization. (E) Retroposition. Transcripts may be reverse transcribed and integrated as an intronless gene elsewhere in the genome. This new gene may then undergo divergence.