In the fields of molecular biology and genetics, a pan-genome (pangenome or supragenome) is the entire set of genes from all strains within a clade.
[2] The genetic repertoire of a bacterial species is much larger than the gene content of an individual strain.
[13] An open access book reviewing the pangenome concept and its implications, edited by Tettelin and Medini, was published in the spring of 2020.
[citation needed] The pan-genome can be somewhat arbitrarily classified as open or closed based on the alpha value of Heaps' law:
[23][15] Usually, the pangenome software can calculate the parameters of the Heap law that best describe the behavior of the data.
It is believed that parasitism and species that are specialists in some ecological niche tend to have closed pangenomes.
[26] Some studies point that prokaryotes pangenomes are the result of adaptive, not neutral evolution that confer species the ability to migrate to new niches.
In 2011 genomic fluidity was proposed as a measure to categorize the gene-level similarity among groups of sequenced isolates.
[31] 'Metapangenome' has been defined as the outcome of the analysis of pangenomes in conjunction with the environment where the abundance and prevalence of gene clusters and genomes are recovered through shotgun metagenomes.
[33] Other authors consider that Metapangenomics expands the concept of pangenome by incorporating gene sequences obtained from uncultivated microorganisms by a metagenomics approach.
[35] The Anvi'o platform developed a workflow that integrates analysis and visualization of metapangenomes by generating pangenomes and study them in conjunction with metagenomes.
[32] In 2018, 87% of the available whole genome sequences were bacteria fueling researchers interest in calculating prokaryote pangenomes at different taxonomic levels.
[22] In 2015, the pangenome of 44 strains of Streptococcus pneumoniae bacteria shows few new genes discovered with each new genome sequenced (see figure).
[45] Among plants, there are examples of pangenome studies in model species, both diploid [9] and polyploid,[10] and a growing list of crops.
They have been reviewed by Eizenga et al. [52] As interest in pangenomes increased, there have been several software tools developed to help analyze this kind of data.
[55] There are seven kinds of software developed to analyze pangenomes: Those dedicated to cluster homologous genes; identify SNPs; plot pangenomic profiles; build phylogenetic relationships of orthologous genes/families of strains/isolates; function-based searching; annotation and/or curation; and visualization.
[11][59] In 2018 panX was released, an interactive web tool that allows inspection of gene families evolutionary history.
[65] panX can display an alignment of genomes, a phylogenetic tree, mapping of mutations and inference about gain and loss of the family on the core-genome phylogeny.
In 2019 OrthoVenn 2.0 [66] allowed comparative visualization of families of homologous genes in Venn diagrams up to 12 genomes.
In 2023, BRIDGEcerealwas developed to survey and graph indel-based haplotypes from pan-genome through a gene model ID.
In 2020, a computational comparison of tools for extracting gene-based pangenomic contents (such as GET_HOMOLOGUES, PanDelos, Roary, and others) has been released.
The analysis was performed by taking into account different bacterial populations, which are synthetically generated by changing evolutionary parameters.
Again in 2020, several tools introduced a graphical representation of the pangenomes showing the contiguity of genes (PPanGGOLiN,[46] Panaroo[65]).