Comparative genomics

[5] In comparative genomics, synteny is the preserved order of genes on chromosomes of related species indicating their descent from a common ancestor.

[12] Comparisons of genome synteny between and within species have provided an opportunity to study evolutionary processes that lead to the diversity of chromosome number and structure in many lineages across the tree of life;[13][14] early discoveries using such approaches include chromosomal conserved regions in nematodes and yeast,[15][16] evolutionary history and phenotypic traits of extremely conserved Hox gene clusters across animals and MADS-box gene family in plants,[17][18] and karyotype evolution in mammals and plants.

The system helps researchers to identify large rearrangements, single base mutations, reversals, tandem repeat expansions and other polymorphisms.

[29] At the same time, Bonnie Berger, Eric Lander, and their team published a paper on whole-genome comparison of human and mouse.

Instead of undertaking their own analyses, most biologists can access these large cross-species comparisons and avoid the impracticality caused by the size of the genomes.

These methods can also quickly uncover single-nucleotide polymorphisms, insertions and deletions by mapping unassembled reads against a well annotated reference genome, and thus provide a list of possible gene differences that may be the basis for any functional variation among strains.

Based on a variety of biological genome data and the study of vertical and horizontal evolution processes, one can understand vital parts of the gene structure and its regulatory function.

It is however often complicated by the multiplicity of events that have taken place throughout the history of individual lineages, leaving only distorted and superimposed traces in the genome of each living organism.

[32][33] Comparative genomics plays a crucial role in identifying copy number variations (CNVs) and understanding their significance in evolution.

CNVs, which involve deletions or duplications of large segments of DNA, are recognized as a major source of genetic diversity, influencing gene structure, dosage, and regulation.

While single nucleotide polymorphisms (SNPs) are more common, CNVs impact larger genomic regions and can have profound effects on phenotype and diversity.

Ongoing research aims to address these questions using techniques like comparative genomic hybridization, which allows for a detailed examination of CNVs and their significance.

[36] Comparative genomics holds profound significance across various fields, including medical research, basic biology, and biodiversity conservation.

[37][38][39] To tackle this challenge, comparative genomics offers a solution by pinpointing nucleotide positions that have remained unchanged over millions of years of evolution.

These conserved regions indicate potential sites where genetic alterations could have detrimental effects on an organism's fitness, thus guiding the search for disease-causing variants.

Moreover, comparative genomics holds promise in unraveling the mechanisms of gene evolution, environmental adaptations, gender-specific differences, and population variations across vertebrate lineages.

[41] For instance, in animal genetics, indigenous cattle exhibit superior disease resistance and environmental adaptability but lower productivity compared to exotic breeds.

[44] Computational approaches will remain critical for research and teaching, especially when information science and genome biology is taught in conjunction.

Additionally, ongoing efforts focus on optimizing existing algorithms to handle the vast amount of genome sequence data by enhancing their speed.

It integrates elements of colinear sequence alignment and gene orthology prediction, presenting a greater challenge due to the vast size and intricate nature of whole genomes.

Despite its complexity, numerous methods have emerged to tackle this problem because WGAs play a crucial role in various genome-wide analyses, such as phylogenetic inference, genome annotation, and function prediction.

Analysis based on coalescence theory tries predicting the amount of time between the introduction of a mutation and a particular allele or gene distribution in a population.

Identifying the loci of advantageous genes is a key step in breeding crops that are optimized for greater yield, cost-efficiency, quality, and disease resistance.

Previous methods of identifying loci associated with agronomic performance required several generations of carefully monitored breeding of parent strains, a time-consuming effort that is unnecessary for comparative genomic studies.

[78] In May 2019, using the Global Genome Set, a team in the UK and Australia sequenced thousands of globally-collected isolates of Group A Streptococcus, providing potential targets for developing a vaccine against the pathogen, also known as S.

Because of their morphological, physiological, and genetic resemblance to humans, mice and rats have long been the preferred species for biomedical research animal models.

In order to comprehend its TCRs and their genes, Glusman conducted research on the sequencing of the human and mouse T cell receptor loci.

Comparisons of the genomic sequences within each physical site or location of a specific gene on a chromosome (locs) and across species allow for research on other mechanisms and other regulatory signals.

Some suggest new hypotheses about the evolution of TCRs, to be tested (and improved) by comparison to the TCR gene complement of other vertebrate species.

A comparative genomic investigation of humans and mice will obviously allow for the discovery and annotation of many other genes, as well as identifying in other species for regulatory sequences.

Whole genome alignment is a typical method in comparative genomics. This alignment of eight Yersinia bacteria genomes reveals 78 locally collinear blocks conserved among all eight taxa . Each chromosome has been laid out horizontally and homologous blocks in each genome are shown as identically colored regions linked across genomes. Regions that are inverted relative to Y. pestis KIM are shifted below a genome's center axis. [ 1 ]
Human FOXP2 gene and evolutionary conservation is shown in and multiple alignment (at bottom of figure) in this image from the UCSC Genome Browser . Note that conservation tends to cluster around coding regions (exons).
Phylogenetic tree of descendant species and reconstructed ancestors. The branch color represents breakpoint rates in RACFs (breakpoints per million years). Black branches represent nondetermined breakpoint rates. Tip colors depict assembly contiguity: black, scaffold-level genome assembly; green, chromosome-level genome assembly; yellow, chromosome-scale scaffold-level genome assembly. Numbers next to species names indicate diploid chromosome number (if known). [ 46 ]
Chromosome by chromosome variation of indicine and taurine cattle. The genomic structural differences on chromosome X between indicine ( Bos indicus Nelore cattle ) and taurine cattle ( Bos taurus Hereford cattle ) were identified using the SyRI tool.
Example of a phylogenetic tree created from an alignment of 250 unique spike protein sequences from the Betacoronavirus family.
Example of synteny block and break. Genes located on chromosomes of two species are denoted in letters. Each gene is associated with a number representing the species they belong to (species 1 or 2). Orthologous genes are connected by dashed lines and genes without an orthologous relationship are treated as gaps in synteny programs. [ 57 ]
Solid green squares indicate mammalian chromosomes maintained as a single synteny block (either as a single chromosome or fused with another MAM), with shades of the color indicating the fraction of the chromosome affected by intra-chromosomal rearrangements (the lightest shade is most affected). Split blocks demarcate mammalian chromosomes affected by inter-chromosomal rearrangements. Upper (green)triangles show the fraction of the chromosome affected by intra chromosomal rearrangements, and lower (red) triangles show the fraction affected by inter chromosomal rearrangements. Syntenic relationships of each MAM to the human genome are given at the right of the diagram. MAMX appears split in goat because its X chromosome is assembled as two separate fragments. BOR, boreoeutherian ancestor chromosome; EUA, Euarchontoglires ancestor chromo-some; EUC, Euarchonta ancestor chromosome; EUT, eutherian ancestor chromosome; PMT; Primatomorpha ancestor chromosome; PRT, primates (Hominidae) ancestor chromosome; THE, therian ancestor chromosome.
Image from the study Evolution of the ancestral mammalian karyotype and syntenic regions. It is a Visualization of the evolutionary history of reconstructed mammalian chromosomes based on the human lineage. [ 46 ]
TCR loci from humans (H, top) and mice (M, bottom) are compared, with TCR elements in red, non-TCR genes in purple, and V segments in orange, other TCR elements in red. M6A, a putative methyltransferase ; ZNF, a zinc-finger protein ; OR, olfactory receptor genes; DAD1, defender against cell death ; The sites of species-specific, processed pseudogenes are shown by gray triangles. See also GenBank accession numbers AE000658-62. Modified after Glusman et al. 2001. [ 81 ]
[Figure 2] Gene structure of the human (top) and mouse (bottom) V, D, J, and C gene segments. The arrows represent the transcriptional direction of each TCR gene. The squares and circles represent going in a direct and reverse direction. Modified after Glusman et al. 2001. [ 81 ]