Directed evolution

Some calculations suggest it is entirely feasible that for all practical (i.e. functional and structural) purposes, protein sequence space has been fully explored during the course of evolution of life on Earth.

[13] The starting gene can be mutagenised by random point mutations (by chemical mutagens or error prone PCR)[14][15] and insertions and deletions (by transposons).

Finally, specific regions of a gene can be systematically randomised[19] for a more focused approach based on structure and function knowledge.

[20] Therefore, a high-throughput assay is vital for measuring activity to find the rare variants with beneficial mutations that improve the desired properties.

[21][22] During in vivo evolution, each cell (usually bacteria or yeast) is transformed with a plasmid containing a different member of the variant library.

This format has the advantage of selecting for properties in a cellular environment, which is useful when the evolved protein or RNA is to be used in living organisms.

When performed without cells, DE involves using in vitro transcription translation to produce proteins or RNA free in solution or compartmentalised in artificial microdroplets.

They are also less expensive and labour-intensive than screening, however they are typically difficult to engineer, prone to artefacts and give no information on the range of activities present in the library.

Each variant gene is individually expressed and assayed to quantitatively measure the activity (most often by a colourgenic or fluorogenic product).

Even the most high throughput assays usually have lower coverage than selection methods but give the advantage of producing detailed information on each one of the screened variants.

[30] A restriction of directed evolution is that a high-throughput assay is required in order to measure the effects of a large number of different random mutations.

[33] Recent theoretical approaches have aimed to overcome the limitation of speed through an application of counter-diabatic driving techniques from statistical physics, though this has yet to be implemented in a directed evolution experiment.

[1][35] Beneficial mutations are rare, so large numbers of random mutants have to be screened to find improved variants.

'Focused libraries' concentrate on randomising regions thought to be richer in beneficial mutations for the mutagenesis step of DE.

[41] As a protein engineering tool, DE has been most successful in three areas: The study of natural evolution is traditionally based on extant organisms and their genes.

This allows for detailed measurements of evolutionary processes, for example epistasis, evolvability, adaptive constraint[60][61] fitness landscapes,[62] and neutral networks.

For example, global proteome-wide substitutions of natural amino acids with fluorinated analogs have been attempted in Escherichia coli[64] and Bacillus subtilis.

[65] A complete tryptophan substitution with thienopyrrole-alanine in response to 20899 UGG codons in Escherichia coli was reported in 2015 by Budisa and Söll.

An example of directed evolution with comparison to natural evolution . The inner cycle indicates the 3 stages of the directed evolution cycle with the natural process being mimicked in brackets. The outer circle demonstrates steps in a typical experiment. The red symbols indicate functional variants, the pale symbols indicate variants with reduced function.
Directed evolution is analogous to climbing a hill on a ' fitness landscape ' where elevation represents the desired property. Each round of selection samples mutants on all sides of the starting template (1) and selects the mutant with the highest elevation, thereby climbing the hill. This is repeated until a local summit is reached (2).
Starting gene (left) and library of variants (right). Point mutations change single nucleotides. Insertions and deletions add or remove sections of DNA. Shuffling recombines segments of two (or more) similar genes.
How DNA libraries generated by random mutagenesis sample sequence space. The amino acid substituted into a given position is shown. Each dot or set of connected dots is one member of the library. Error-prone PCR randomly mutates some residues to other amino acids. Alanine scanning replaces each residue of the protein with alanine, one-by-one. Site saturation substitutes each of the 20 possible amino acids (or some subset of them) at a single position, one-by-one.
An expressed protein can either be covalently linked to its gene (as in mRNA , left) or compartmentalized with it ( cells or artificial compartments , right). Either way ensures that the gene can be isolated based on the activity of the encoded protein.