Maximum genetic diversity hypothesis

From EverybodyWiki Bios & Wiki

The maximum genetic diversity hypothesis is a scientific hypothesis about the process of molecular evolution, the study of genetic change in populations over time.[1][2]

The hypothesis starts with the observation that some regions of the genome are more likely to preserve mutations into the next generation than others.[3] This difference in the observed rate of mutation means some regions of the genome appear to mutate faster than others, and is theorized to relate to balancing the preservation of vital information relating to a species' function against its ability to mutate and adapt to new environmental niches.[2][3] According to the hypothesis, these regions of the genome eventually drift into two rough categories: faster-mutating sections tuned to respond quickly to environmental pressures and allow adaptive radiation, as well as slower-mutating sections involved in an organism's most fundamental instructions.[2][1]

The maximum genetic diversity hypothesis asserts that only slow-mutating genes accurately reflect shared evolutionary history, relationships between species can alternatively be calculated by their "maximum genetic diversity," which is determined by measuring the frequency of mutations in specific corresponding regions of orthologous genes instead of using raw overall genetic similarity.[1][3]

Using calculations based on mutations in these slow-mutating genes provides a chart of genetic ancestry that lines up with the fossil record – measurements based on raw genetic similarity yield results that clash with the fossil record.[2][1][4] Also due to this grouping into fast and slow, it is proposed that over time complex organisms become genetically fragile and less tolerant to mutation as their genetic diversity decreases, since an increasing proportion of their genome will have become slow-mutating over time.

The hypothesis asserts that this is because increased organismal and social complexity means more of the genome is needed to preserve the expanding instructional manual necessary for complex behavior and function, and so more of an organism's genome must become slow-mutating as the organism increases in complexity, since being slow-mutating preserves and protects those vital instructions.[2][5][6]

Furthermore, beyond the fact that the hypothesis is still relatively unknown, it also contradicts the current paradigm in molecular evolution, since the neutral theory's fundamental premises are still nearly ubiquitously utilized in genetic analysis and admixture studies.[1][7][2] Additionally, some of the phenomena explained by the hypothesis could theoretically be accounted for by other processes such as gene conversion or concerted evolution.[2] Lastly, even if the neutral theory is disproved, it does not necessarily validate the hypothesis, as alternative theories have been proposed that also incorporate the effects of selection on the genome.[8]


According to the maximum genetic diversity hypothesis, modern evolutionary theory becomes an interplay between short-term microevolution which follows the neutral theory's expectation of random but predictable rate of linear change, and longer-term macroevolution that cannot be timed with the same clock as microevolution since diversification can flow in punctuated fits and starts over long periods of time when a complex species disperses into an array of diverse environmental niches.[1][9]

As this occurs, the hypothesis predicts that each population will preserve the slow-mutating section of its genome which holds its most fundamental instructions from mutations, but quickly preserve mutations at sites that provide greater environmental fitness depending on the pressures of each unique niche. Support for this supposition is provided by a genetic model that seeks to solve the inconsistencies between the way regions of proteins seem to mutate in unison and the speed at which that happens, which observed that mutations seem to occur in "avalanches" that drastically alter not only specific regions of the genome as commonly assumed, but also only for short periods of time, using modeling to create a model of evolutionary change.

The fact that some genomic regions preserve mutations at different rates than others can be demonstrated when any three species separated by significant evolutionary time are compared two at a time: each pair can have aligned overlapping genomic positions in orthologous proteins where mutations get preserved at a far higher rate than the neutral theory's random drift statistically allows.[1]

Additionally, these overlapping regions not only have higher mutation rates, they are also less likely to be involved with an organism's fundamental instructions within the maximum genetic diversity framework.[2][3] This is supported by the observation that the higher percentage of active-coding exons a species has, the lower percentage of overlapping sites it will have – species with more of their genome designated for active-coding exons – stretches of nucleotides that are used by RNA to create proteins – have a lower proportion of high-mutation overlapping sites, and vice versa.[9][1]

Since an organism's fundamental genomic structure can be traced by the maximum possible diversity of observed mutations in orthologous genes shared with a given sister species, measuring evolutionary and genomic distance is not done universally under this theory: instead the unit of measurement for each pairing is set by the maximum genetic diversity of an orthologous gene shared by the species. The simpler of any two organisms then sets this quantity, which will always be independent of both mutation rates and time.[2][1]

Calculations derived from the neutral theory result in the false conclusion that fish are equidistant to every single evolutionarily ascending species: the same distance from homo sapiens as to snake, and from ox to rabbit, rat, pig and tiger and boar.[9][10] Calculations using the hypothesis yield results that capture the branching and punctuated nature of speciation, preserve its gradual increasing fractal complexity over time, and are consistent with patterns of speciation deduced from the fossil record.[9]


Since the 1960s, when the term was popularized by Richard Lewontin,[11] the amount of genetic diversity in populations was accepted to tick steadily at a rate timed by a molecular clock set by the mutation of a hemoglobin protein in most vertebrates, which was first calculated by Emanuel Margoliash.[12] This conclusion, that genetic diversity would accumulate within a population indefinitely over time, was reached because it was assumed that every population's genome would continually accumulate mutations as time passed – and so the more mutations that were observed the more basal and older a population was assumed to be since there was thought to be no upper limit as to how many mutations could accumulate.

Timing this presumably stable and universal rate of mutation and hence diversity using the molecular clock was first theorized by Motoo Kimura, but popularized by Émile Zuckerkandl and Linus Pauling. It was assumed to regulate all genetic variation both within and between species. Subsequently, the neutral theory and molecular clock were used in a variety of settings, most notably in phylogenetics, or the study of how different species change and pass on traits over time. Many measurements that are nearly ubiquitous in population genetics, such as the fixation index, are also based on the molecular clock.[13]

However, since its inception there have been points against the neutral theory and its molecular clock's fundamental assumptions, such evidence that they may be affected by natural selection.[14][15] Despite this, the molecular clock was assumed to regulate all orthologous genes inherited from a common ancestor, and used to set the historic rate of speciation across the animal kingdom, as well as answer questions around the evolutionary and genetic relationships between species.[4][14]

Despite its widespread use, the molecular clock can encounter a ten-fold rate of error depending on whether dates are being assessed within or before the past two million years,[4] and a twenty-fold rate of error within the same time-frame depending on exactly which way the molecular clock is applied.[4]

Additionally, it has been argued that there is no independent evidence to support the molecular clock's premise that all species have similar mutation rates,[16] and the neutral theory fails to note and explain the common occurrence of overlapping mutations: where mutations in independently evolving species occur at orthologous overlapping protein positions at a rate too high to be neutral.[3] the hypothesis provides a framework for the genetic equidistance phenomenon, and it appears to align findings derived from predictable genomic patterns that have been observed in simple organisms like yeast all the way up to the most complex, homo sapiens.[17]

Contrary to Emanuel Margoliash's original assumption that genetic distance could be universally determined by the rate of mutation in a blood protein, and then calculated for all life on earth by time alone[18] - meaning that all mutations on earth were set by that protein and had a biologically universal rate that is constant and steady - the genetic equidistance phenomenon could also be explained by the assumption that mutation-rates are specific to each gene and might vary across species and within populations.[1][2]

In 2008, a cancer researcher, Shi Huang, then a faculty member at the Sanford Burnham Prebys Medical Research Institute in La Jolla California and since 2009 at South Central University China, independently discovered the genetic equidistance phenomenon and first published a preprint describing the Maximum Genetic Diversity theory in 2008, which was published as a peer-reviewed book chapter later that year.[19]

Fast and slow[edit]

Because simpler organisms are less likely to be affected at all by any one single-base mutation in their exons, or functionally active coding stretches of their genome, the hypothesis considers them to be more genetically robust than more complex organisms whose genomes are less tolerant to mutation and so are thought to be more fragile.[9] It is theorized that in complex organisms that depend on myriad interconnected networks of proteins and regulation, there is far less margin for error since the odds that a substitution will create an erroneous and detrimental base-change in a crucial stretches of the genome increase as more DNA sequences and their subsequent proteins are needed in additional fine-tuned cell-types and epigenetic functions.[9]

Like interspersed series of genetic capstones, these stretches of code become relatively less tolerant to mutation and will appear to be mutating more slowly when compared to faster-mutating less-fundamental regions of the genome according to the hypothesis.[9][1] One example of this is the observation that mtDNA is not selectively neutral – meaning that certain versions of its alleles are much more beneficial than others – since its average diversity holds across all animal phyla, capturing mtDNA's comparatively slower rate of mutation and intrinsic functional importance.[9]

The hypothesis asserts that as organismal and social complexity grows, the need to preserve the fundamental structure of an organism – the most basal directions involved in the most basic development and functions which mutate more slowly – becomes balanced against the need to be able to adapt to increasing numbers of environmental pressures and challenges – done by faster-mutating regions that respond adaptively to environmental pressures.[17] The hypothesis states that the variability in the rate of change causes evolutionary selective pressures to sort alleles into two rough groups: slow-mutating ones involved with an organism's most basic structure and function, and fast-mutating ones that respond quickly in order to increase the odds a beneficial mutation occurs and is preserved.[17]

As time passes and a species increases in complexity, it is theorized that a greater proportion of its genome becomes slow-mutating as a greater amount of information becomes needed to preserve a more complex organism's fundamental development and behavior. And with a larger proportion of its genome dedicated to intrinsic instructions, the species is considered to be more genetically fragile.[9]

Intertwined evolutionary effects[edit]

Under the hypothesis, as organisms increase in complexity population-wide genetic diversity is regulated by the need to maintain a harmonious balance between those two broad categories: fast-mutating alleles that adapt quickly to the pressure of a given environment, and slow-mutating ones that preserve the most fundamental and basal instructions for the organism.[17][9] Maintaining this balance means that simpler organisms will have a higher percentage of their genome able to tolerate mutational change, since simple means less-complex biological and epigenetic processes that are more tolerable to change than those of more and genomically delicate complex organisms according to the hypothesis.[9] As organismal complexity increases, the margin for genomic error narrows and toleration for new mutations shrinks since within the framework higher-order life means more complex cellular mechanisms and more fragile biological processes.[9]

The hypothesis suggests that maximum population-wide genetic diversity can increase up to a point that is set by the physiological and epigenetic complexity of the organism and its environmental interactions, but past that maximal fitness is decreased because the level of mutation becomes maladaptive by deleteriously altering an organism's fundamental instructions.[9] Having less than the maximum and ideal level of genetic diversity means poor adaptive capacity to respond to changing environmental pressures under the hypothesis, and higher than the ideal maximum means damage to the basic physiology of the organism because its most basal instructions become damaged.

The hypothesis proposes that a population will have the most evolutionary success when its diversity level is properly tuned to its environment, and when the levels of its slow- and fast-mutating alleles are optimally balanced.[17]

The slow clock[edit]

The hypothesis calculates the time and amount of genetic divergence between species by first randomly picking a statistically-significant set of orthologous genes shared between any three macroevolutionary-distant species. Genes are sorted as either fast or slow after the alignment of two more closely related species' orthologous genes alongside the third less-related species. If no amino-acid positions overlap, meaning they share a mutation at a given position, the gene is assigned a score of 0. For genes with any overlapping amino-acid positions at all, the higher that count of overlapping positions the faster mutating the gene is considered to be. Since there is no hard ratio necessary to measure maximum genetic diversity, the genes would then be sorted into roughly half slow-mutating and half fast-mutating.[6]

In any group of species A, B, and C – with A the most complex and C the least – just because A and B might appear closer to each other as far as their entire genomes are considered, that does not necessarily mean they can be grouped together, and C considered an outgroup. To chart their genetic relationships, only the distances between slow-evolving genes can be used under maximum genetic diversity. Only when both A and B are the same distance away from C measured by their slow-evolving genes can they be considered a separate clade.[6]

The "slow clock" is based on reports that using the molecular clock derived from the neutral theory to time species divergence can be off by up to a factor of twenty, depending on whether a single unchanging species is being used or an inter-species comparison is being made. The hypothesis holds that the molecular clock can still be used to accurately measure genetic diversity in relatively short time scales among similar species, while its accuracy fades when it is applied to windows over hundreds of thousands of years, and when applied to species with diverse phenotypic expression.[14][17][7]

Overlapping origin stories[edit]

Starting from the fact that some regions of the genome preserve mutations faster than others, the approach builds an explanatory framework for the fact that population-wide genetic diversity does not always increase from small to large without an upper limit, something that must always happen under the neutral theory and its infinite site models which posit that over time a population will continue to accumulate mutations at a steady rate as time passes upwards to infinity and beyond.[20][2]

The hypothesis posits that when two populations have different genetic diversity levels, it does not necessarily mean that the population with lower genetic diversity is descended from the one with higher genetic diversity as implied by the neutral theory.[9] Under the neutral theory's molecular clock, the most basal or older populations will always have the highest rate of diversity because existing first means more mutations would have had time to accumulate in their genome. However, higher overall genomic diversity may simply be due to having more fast-mutating alleles needed to deal with a wider array of environmental challenges, but since genetic distance can only be measured by slow-mutating genes, raw overall diversity rates alone should not to used to derive genetic relationships since slow-mutating genes may make up a minority of the genome.[17]

The fact that most broad phenotypic traits are regulated by multiple loci is also incompatible with the neutral theory, since it would be statistically unlikely for enough linkage disequilibrium to form across the genome if mutations were occurring randomly. The hypothesis accounts for this, since phenotypically linked fast-mutating SNPs are recognized to respond to selective pressures more rapidly than the slow-mutating more basal SNPs.[17] The hypothesis also explains why raw genetic diversity does not flow temporally from basal to more modern as a concrete rule.[2]

The hypothesis contradicts the fixation index, which assumes the neutral theory applies across the entire genome and only considers fast-mutating autosomal DNA in population genetics analyses.[1][2]


  1. 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10 1.11 Huang, Shi (2016-07-01). "New thoughts on an old riddle: What determines genetic diversity within and between species?a". Genomics. Comprehensive functionality of genomic DNA. 108 (1): 3–10. doi:10.1016/j.ygeno.2016.01.008. ISSN 0888-7543. PMID 26835965.
  2. 2.00 2.01 2.02 2.03 2.04 2.05 2.06 2.07 2.08 2.09 2.10 2.11 2.12 Ho, Mae Wan (2010), "Development and Evolution Revisited", Handbook of Developmental Science, Behavior, and Genetics, John Wiley & Sons, Ltd, pp. 61–109, doi:10.1002/9781444327632.ch4, ISBN 9781444327632
  3. 3.0 3.1 3.2 3.3 3.4 Rizzato, Francesca; Zamuner, Stefano; Pagnani, Andrea; Laio, Alessandro (2019-12-02). "A common root for coevolution and substitution rate variability in protein sequence evolution". Scientific Reports. 9 (1): 18032. Bibcode:2019NatSR...918032R. doi:10.1038/s41598-019-53958-w. ISSN 2045-2322. PMC 6888882 Check |pmc= value (help). PMID 31792239.
  4. 4.0 4.1 4.2 4.3 Pulquério, Mário J. F.; Nichols, Richard A. (2007-04-01). "Dates from the molecular clock: how wrong can we be?". Trends in Ecology & Evolution. 22 (4): 180–184. doi:10.1016/j.tree.2006.11.013. ISSN 0169-5347. PMID 17157408.
  5. Huang, Y. M.; Xia, M. Y.; Huang, S. (May 2013). "[Evolutionary process unveiled by the maximum genetic diversity hypothesis]". Yi Chuan = Hereditas. 35 (5): 599–606. doi:10.3724/sp.j.1005.2013.00599. ISSN 0253-9772. PMID 23732666.
  6. 6.0 6.1 6.2 Huang, Shi (June 10, 2012). "Primate phylogeny: molecular evidence for a pongoid clade excluding humans and a prosimian clade containing tarsiers" (PDF). Science China Life Sciences. 55 (8): 709–725. doi:10.1007/s11427-012-4350-7. PMID 22932887 – via Springer.
  7. 7.0 7.1 Kern, Andrew D.; Hahn, Matthew W. (2018-06-01). "The Neutral Theory in Light of Natural Selection". Molecular Biology and Evolution. 35 (6): 1366–1371. doi:10.1093/molbev/msy092. ISSN 0737-4038. PMC 5967545. PMID 29722831.
  8. Chen, Bingjie; Shi, Zongkun; Chen, Qingjian; Shen, Xu; Shibata, Darryl; Wen, Haijun; Wu, Chung-I. (2019-07-01). "Tumorigenesis as the Paradigm of Quasi-neutral Molecular Evolution". Molecular Biology and Evolution. 36 (7): 1430–1441. doi:10.1093/molbev/msz075. ISSN 0737-4038. PMID 30912799.
  9. 9.00 9.01 9.02 9.03 9.04 9.05 9.06 9.07 9.08 9.09 9.10 9.11 9.12 9.13 Yuan, Dejian; Huang, Shi (2017-07-01). "Genetic equidistance at nucleotide level". Genomics. 109 (3): 192–195. doi:10.1016/j.ygeno.2017.03.002. ISSN 0888-7543. PMID 28315383.
  10. Hahn, Matthew W. (2008). "Toward a Selection Theory of Molecular Evolution". Evolution. 62 (2): 255–265. doi:10.1111/j.1558-5646.2007.00308.x. ISSN 1558-5646. PMID 18302709.
  11. Lewontin, Richard C (1974). The Genetic Basis for Evolutionary Change. United States: Columbia University Press. pp. 1–16. ISBN 0-231-03392-3. Search this book on Logo.png
  12. Zuckerkandl, Emile; Pauling, Linus (1965-03-01). "Molecules as documents of evolutionary history". Journal of Theoretical Biology. 8 (2): 357–366. doi:10.1016/0022-5193(65)90083-4. ISSN 0022-5193. PMID 5876245.
  13. Brown, A. H. D. (1970-12-01). "The estimation of Wright's fixation index from genotypic frequencies". Genetica. 41 (1): 399–406. doi:10.1007/BF00958921. ISSN 1573-6857. PMID 5488990.
  14. 14.0 14.1 14.2 Ayala, F. J. (January 1999). "Molecular clock mirages". BioEssays. 21 (1): 71–75. doi:10.1002/(SICI)1521-1878(199901)21:1<71::AID-BIES9>3.0.CO;2-B. ISSN 0265-9247. PMID 10070256.
  15. Langley, Charles H.; Fitch, Walter M. (1974-09-01). "An examination of the constancy of the rate of molecular evolution". Journal of Molecular Evolution. 3 (3): 161–177. Bibcode:1974JMolE...3..161L. doi:10.1007/BF01797451. ISSN 1432-1432. PMID 4368400.
  16. Copley, Richard R; Schultz, Jörg; Ponting, Chris P; Bork, Peer (1999-06-01). "Protein families in multicellular organisms". Current Opinion in Structural Biology. 9 (3): 408–415. doi:10.1016/S0959-440X(99)80055-4. ISSN 0959-440X. PMID 10361098.
  17. 17.0 17.1 17.2 17.3 17.4 17.5 17.6 17.7 Huang, Shi (2016-07-01). "New thoughts on an old riddle: What determines genetic diversity within and between species?". Genomics. Comprehensive functionality of genomic DNA. 108 (1): 3–10. doi:10.1016/j.ygeno.2016.01.008. ISSN 0888-7543. PMID 26835965.
  18. Margoliash, E. (October 1963). "Primary Structure and Evolution of Cytochrome C". Proceedings of the National Academy of Sciences of the United States of America. 50 (4): 672–679. Bibcode:1963PNAS...50..672M. doi:10.1073/pnas.50.4.672. ISSN 0027-8424. PMC 221244. PMID 14077496.
  19. Huang, Shi (2008). "Ancient fossil specimens of extinct species are genetically more distant to an outgroup than extant sister species are". Rivista di biologia. 101 (1): 93–108. PMC 2649772. PMID 18600632.
  20. Biswas, Kakali; Chakraborty, Sandip; Podder, Soumita; Ghosh, Tapash Chandra (2016-07-01). "Insights into the dN/dS ratio heterogeneity between brain-specific genes and widely-expressed genes in species of different complexity". Genomics. Comprehensive functionality of genomic DNA. 108 (1): 11–17. doi:10.1016/j.ygeno.2016.04.004. ISSN 0888-7543. PMID 27126306.

This article "Maximum genetic diversity hypothesis" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Maximum genetic diversity hypothesis. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.