In This Article Expand or collapse the "in this article" section Coalescent Theory

  • Introduction
  • General Overviews
  • Textbooks
  • Journals
  • Ewens’ Sampling Formula
  • Cannings Model
  • The Coalescent with Selection and the Ancestral Selection Graph
  • Spatial Models
  • Software

Evolutionary Biology Coalescent Theory
Robert Griffiths
  • LAST REVIEWED: 12 January 2023
  • LAST MODIFIED: 12 January 2023
  • DOI: 10.1093/obo/9780199941728-0143


The Kingman coalescent describes the ancestral tree of a sample of homologous genes from a population. Mutations occurring to ancestors in the tree determine the gene types in the sample. The coalescent originated as a mathematical concept and has become important in biology because it provides a way of looking back in time that allows ancestral inference, for example about parameters such as the mutation rate, the time to the most recent ancestor of a sample (TMRCA), ages of mutations, the way population size has changed back in time, and geographical information. The coalescent is a description of the ancestral tree of a sample of individuals from a neutral Wright-Fisher population with large haploid population gene size N, constant over time. Random mating in the population is assumed. With the large population size, the ancestral tree is approximately binary, that is, only two ancestral lineages can join, or coalesce, at a time instant. Time is measured in a long scale of N generations, and technically N tends to infinity. With a large sample size, coalescence is fast, so much so that the coalescent with the sample size tending to infinity represents ancestry back in time of the whole population as a well-defined coalescent tree starting with an infinite number of genes. The Kingman coalescent is said to come down from infinity because of this property. There is a connection between diffusion process models of allele frequencies and the coalescent. In the infinitely-many-alleles model, mutations are always to a new type not existing previously. The allele frequencies in a sample arise from mutations on coalescent lineages that are duplicated forward in time. The famous Ewens’ sampling formula is the probability distribution of the allele frequencies in a sample under the infinitely-many-alleles model. Extensions of the original coalescent model include variable population size, recombination, selection, and spatial models. The ancestral recombination graph (ARG) describes the ancestry of a sample of genes when recombination is present. The ancestral selection graph (ASG) describes the ancestral tree when there are different selective advantages between allele types. The Lambda coalescent is a related model of ancestry where multiple individuals can coalesce at the same time. Coalescent theory has contributed to biology and mathematics, a nice interplay of disciplines.

General Overviews

All the articles and books in this section are very accessible for a biologist with some mathematical knowledge. Hudson 1990 is an early review of coalescent theory focusing on genealogies with mutations and recombination. Emphasis is on the time to the most recent ancestor (TMRCA) in a sample of genes and the number of segregating sites in a sample of DNA sequences. Donnelly and Tavaré 1995 is a slightly more mathematical paper covering robustness of the coalescent from more general populations than a Wright-Fisher population, variable population size, geographically structured populations, simulation, and inference. Nordborg 2007 covers early coalescent theory and generalizations that include robustness, variable population size, different time scales, the structured coalescent, the Ancestral Recombination Graph, and balancing selection. Rosenberg and Nordborg 2002 discuses coalescent trees, gene trees, species trees, and the meaning of Mitochondrial Eve. Stephens 2007 gives an introduction to inference under the coalescent, detailing sequential importance sampling and Markov Chain Monte Carlo (MCMC) techniques. Kuhner 2009 gives a review of coalescent software for inference at that time. Wakeley 2013 was a commentary with a wide coverage of initial developments and a discussion of the papers in an issue of Theoretical Population Biology devoted to the coalescent. Barton 2016 is a review of the contributions of Hudson and Kaplan in the 1990s to coalescent theory, particularly discussing their idea of a structured coalescent where there is a random background from selection and recombination. Fu and Li 1999 discusses early coalescent theory, inference, and future prospects. Wakeley 2020 is a wide-ranging review covering initial developments, the infinitely-many-alleles model, lines of descent, the stepwise mutation model, and how the infinitely-many-sites model and gene genealogies are connected. It also is partly a review of papers that have appeared in Theoretical Population Biology up to 2020. Möhle 2000 gives a mathematical review of how the coalescent arises in a robust way from populations as their size tends to infinity. Included in the review are diploid and two-sex models. Möhle has published an extensive number of papers dealing with conditions for convergence to the coalescent from discrete models where the offspring distribution of individuals is exchangeable.

  • Barton, N. 2016. Richard Hudson and Norman Kaplan on the coalescent process. Genetics 202:865–866.

    DOI: 10.1534/genetics.116.187542

    A review of Hudson and Kaplan’s important papers in the 1980s and 1990s discussing the structured coalescent, with Hudson and Kaplan’s papers referenced. An account from a biological point of view.

  • Donnelly, P., and S. Tavaré. 1995. Coalescents and genealogical structure under neutrality. Annual Review of Genetics 29:401–421.

    DOI: 10.1146/

    A very clear and interesting review of coalescent theory at that time. The paper contains a description of the coalescent, robustness of the coalescent depending on the variance of the offspring distribution, variable population size, and the coalescent in geographically structured populations. A discussion is included on using coalescent theory in human genetics to infer the time to the most recent ancestor of a population of genes.

  • Fu, Y. X., and W. H. Li. 1999. Coalescing into the 21st century: An overview and prospects of coalescent theory. Theoretical Population Biology 56:1–10.

    DOI: 10.1006/tpbi.1999.1421

    A historical overview of contributions and prospects of coalescent theory, which is easy to read. The review focuses on inference: tests for neutrality using segregating sites, inferring ancestral population size, estimation recombination rate, and the potential application to human population genetics.

  • Hudson, R. R. 1990. Gene genealogies and the coalescent process. Oxford Surveys of Evolutionary Biology 7:1–44.

    A very nice, easy-to-read review of many coalescent models. This paper is often quoted in the biology literature. It covers separating the genealogical process from the mutation process in the coalescent, the coalescent with recombination, estimating the recombination rate, migration, balancing selection, and hitchhiking.

  • Kuhner, M. K. 2009. Coalescent genealogy samplers: Windows into population history. Trends in Ecology and Evolution 24:86–93.

    DOI: 10.1016/j.tree.2008.09.007

    A review of software available at the time of writing that uses coalescent methods for estimation of genetic parameters and ancestral inference.

  • Möhle, M. 2000. Ancestral processes in population genetics—the coalescent. Journal of Theoretical Biology 204:629–638.

    DOI: 10.1006/jtbi.2000.2032

    A mathematical summary of how the coalescent arises by convergence when the population size is large. The limit theorems are general covering two-sex, diploid, and non-exchangeable offspring distributions. This is a mathematical paper.

  • Nordborg, M. 2007. Coalescent theory. In Handbook of statistical genetics. 3d ed. Edited by D. J. Balding, M. J. Bishop, and C. Cannings, 843–877. Hoboken, NJ: John Wiley and Sons.

    An interesting article covering basic coalescent theory that also discusses coalescence with two time scales, such as in models with selfing. Diploid models, recombination, and selection are covered. An easy-to-read paper.

  • Rosenberg, N. A., and M. Nordborg. 2002. Genealogical trees, coalescent trees, and the analysis of genetic polymorphisms. Nature Reviews Genetics 3:380–390.

    DOI: 10.1038/nrg795

    A review that includes the coalescent and a discussion of gene trees and species trees and problems with their inference.

  • Stephens, M. 2007. Inference under the coalescent. In Handbook of statistical genetics. Edited by D. J. Balding, M. J. Bishop, and C. Cannings, 878–908. 3d ed. Chichester, UK: John Wiley and Sons.

    An excellent introduction to modern methods of inference on molecular data using the coalescent that is easy to read. Likelihood-based inference, MCMC, importance sampling, Product of Approximate Conditionals (PAC) models, and Approximate Bayesian Computation (ABC) are discussed.

  • Wakeley, J. 2013. Coalescent theory has many new branches. Theoretical Population Biology 87:1–4.

    DOI: 10.1016/j.tpb.2013.06.001

    A commentary in an issue of Theoretical Population Biology that was devoted to the coalescent. A nice review paper.

  • Wakeley, J. 2020. Developments in coalescent theory from single locus to chromosomes. Theoretical Population Biology 133:56–64.

    DOI: 10.1016/j.tpb.2020.02.002

    An excellent and easy-to-read modern review paper on the coalescent. The paper has an emphasis on going forward to chromosomes where recombination is important. There is an extensive set of references.

back to top

Users without a subscription are not able to see the full content on this page. Please subscribe or login.

How to Subscribe

Oxford Bibliographies Online is available by subscription and perpetual access to institutions. For more information or to contact an Oxford Sales Representative click here.