Statistical phylogenetics study resources

Yuwei BaoApril 6, 2023

Must read papers for graduate students

This list was organized and posted on [1] by Bob Thomson [2] in 2015:

Bull, J. J., Huelsenbeck, J. P., Cunningham, C. W., Swofford, D. L., & Waddel, P. J. (1993). Partitioning and combining data in phylogenetic analysis. Systematic Biology, 42(3), 384–397.

Cavalli-Sforza, L. L., & Edwards, a W. F. (1967). Phylogenetic analysis. Models and estimation procedures. The American Journal of Human Genetics, 19, 233–257.

Edwards, S. V. (2009). Is a new and general theory of molecular systematics emerging? Evolution, 63, 1–19.

Felsenstein, J. (1973). Maximum likelihood and minimum-steps methods for estimating evolutionary trees from data on discrete characters. Systematic Biology, 22, 240–249.

Felsenstein, J. (1978). Cases in which parsimony or compatibility methods will be positively misleading. Systematic Zoology, 27, 401–410.

Felsenstein, J. (1981). Evolutionary trees from DNA sequences: a maximum likelihood approach. Journal of Molecular Evolution, 17, 368–376.

Felsenstein, J. (1985). Confidence limits on phylogenies: an approach using the bootstrap. Evolution, 39, 783–791.

Felsenstein, J. (1985). Phylogenies and the comparative method. American Naturalist, 125, 1–15.

Goldman, N. (1993). Statistical tests of models of DNA substitution. Journal of Molecular Evolution, 36, 182–198.

Hillis, D. M., & Bull, J. J. (1993). An Empirical Test of Bootstrapping as a Method for Assessing Confidence in Phylogenetic Analysis. Systematic Biology, 42, 182–192.

Holder, M., & Lewis, P. O. (2003). Phylogeny estimation: traditional and Bayesian approaches. Nature Reviews. Genetics, 4, 275–284.

Kumar, S., Filipski, A. J., Battistuzzi, F. U., Kosakovsky Pond, S. L., & Tamura, K. (2012). Statistics and truth in phylogenomics. Molecular Biology and Evolution, 29, 457–472.

Maddison, W. P. (1997). Gene Trees in Species Trees. Systematic Biology, 46, 523–536.

Pauling, L., & Zuckerkandl, E. (1963). Chemical paleogenetics. Acta Chem. Scand, 17, S9 – S16.

Sullivan, J., & Swofford, D. (1997). Are Guinea Pigs Rodents?? The Importance of Adequate Models in Molecular Phylogenetics. Journal of Mammalian Evolution, 4, 77–86.

The following were recommended by by Joe Felsenstein [3] in 2016:

Yang, Z. 1994. Maximum-likelihood estimation of phylogeny from DNA sequences when substitution rates differ over sites. Molecular Biology and Evolution 10: 1396-1401. [Use of gamma distribution of rate variation in ML phylogenies]

Yang, Z. 1994. Maximum likelihood phylogenetic estimation from DNA sequences with variable rates over sites: approximate methods. Journal of Molecular Evolution 39: 306-314. [Approximating gamma distribution in ML phylogenies by an HMM]

Yang, Z. 1995. A space-time process model for the evolution of DNA sequences. Genetics 139: 993-1005. [Allowing for autocorrelated rates along the molecule using an HMM for ML phylogenies]

Felsenstein, J. and G. A. Churchill. 1996. A Hidden Markov Model approach to variation among sites in rate of evolution Molecular Biology and Evolution 13: 93-104. [HMM approach to evolutionary rate variation]

Thorne, J. L., N. Goldman, and D. T. Jones. 1996. Combining protein evolution and secondary structure. Molecular Biology and Evolution 13 666-673. [HMM for secondary structure of proteins, with phylogenies]

The following were recommended by Jeffrey Thorne [4] in 2016:

Posterior Predictive Inference in Phylogenetics: J.P. Bollback. 2002. Molecular Biology and Evolution. 19:1171-1180

Harmonic Mean and other techniques for estimating Bayes factors: Newton and Raftery. 1994. Journal of the Royal Statistical Society. Series B. 56(1):3-48.

More reliable ways to approximate marginal likelihood

Thermodynamic Integration to Approximate Bayes Factors (adapted to molecular evolution data): Lartillot and Philippe. 2006. Syst. Biol. 55:195-207

Improving marginal likelihood estimation for Bayesian phylogenetic model selection. W. Xie, P.O. Lewis, Y. Fan, L. Kao, M-H Chen. 2011. Syst Biol. 60(2):150-160.

Choosing among partition models in Bayesian phylogenetics. Y. Fan, R. Wu, M-H Chen, L Kuo, P.O. Lewis. 2011. Mol. Biol. Evol. 28(1):523-532.

Markov chain Monte Carlo without likelihoods. P. Marjoram, J. Molitor, V. Plagnol, and S. Tavare. 2003. PNAS USA. 100(26): 15324-15328.

H. Jeffreys. The Theory of Probability (3e). Oxford (1961); p. 432

M.A. Beaumont, W. Zhang, D.J. Balding. Approximate Bayesian Computation in Population Genetics. 2002. Genetics 162:2025-2035.

The following were recommended by Jeffery Thorne [5]

CARLIN, B.P., and T.A. LOUIS. 1996. Bayes and Empirical Bayes Methods for Data Analysis. Chapman and Hall, London.

GELMAN, A., J.B. CARLIN, H.S. STERN, and D.B. RUBIN. 1995. Bayesian Data Analysis. Chapman and Hall, London.

Hastings WK (1970) Monte Carlo sampling methods using Markov chains and their applications. Biometrika 57:97–109

METROPOLIS, N., A.W. ROSENBLUTH, M.N. ROSENBLUTH, A.H. TELLER, and E. TELLER. 1953. Equations of state calculations by fast computing machines. J. Chem. Phys. 21: 1087–1092.

The following were recommended by Matthew Stephens and Eric C. Anderson

Examples of importance sampling in genetics:

  1. Griffiths and Tavare (1994)open in new window
  2. Matt Stephens and Peter Donnelly (2000)open in new window
  3. Anderson and Garza (2006)open in new window

Metropolis coupled MCMC (Geyer 1991)open in new window

Other awesome recourses

  1. Joe Felsenstein's coursesopen in new window
  2. 2016 SISG Module 19: Molecular Phylogenetics (Instructors: Mark Holder, Jeffery Thorne, and Joe Felsenstein)open in new window
  3. Bayesian Methods Class by Rebecca C. Steorts at Duke Universityopen in new window
  4. 2019 Bodega Applied Phylogenetics Workshop by UC Davis & Bodega Marine Laboratoryopen in new window
  5. Phylogenetics Seminars organized by Frederick "Erick" Matsenopen in new window
  6. Phylogenetics discussion forum organized by Frederick "Erick" Matsenopen in new window
  7. Paul O. Lewis' grad-level Phylogenetics classopen in new window
  8. Jeffrey L. Thorne's grad-level Bioinformatics II classopen in new window
  9. Bayesian Inference notes with R examples by Ville Hyvönen & Topias Tolonenopen in new window
  10. Matthew Stephens's fiveMinuteStats: basic statistics concepts with practical R codesopen in new window
  11. Nicolas Lartillot's blog: The Bayesian kitchenopen in new window
  12. Book: Statistical Rethinking-A Bayesian Course with Examples in R and Stan by Richard McElreathopen in new window
  13. Introduction to Computational Molecular Biology: Molecular Evolution at University of Washington in 2010open in new window Note: Dr. Felsenstein starts teaching Likelihood at 31'30open in new window
  14. Eric C. Anderson's Handbook on Practical Computing and Bioinformatics for Conservation and Evolutionary Genomicsopen in new window
  15. Eric C. Anderson's MCMC simulations/visualizations demoopen in new window Note: Eric C. Anderson's SISG MCMC OpenGL Demos tutorial on Youtubeopen in new window
  1. Book Chapter: MCMC using Hamiltonian dynamics by Radford M. Nealopen in new window
  2. Book: A Conceptual Introduction to Hamiltonian Monte Carlo by Michael Betancourtopen in new window

  1. http://treethinkers.org/update-must-read-papers-for-graduate-students/open in new window ↩︎

  2. http://thomsonlab.org/people/bob-thomson/open in new window ↩︎

  3. https://evolution.gs.washington.edu/sisg/2016/2016_SISG_19_5.pdfopen in new window ↩︎

  4. https://evolution.gs.washington.edu/sisg/2016/2016_SISG_19_9.pdfopen in new window ↩︎

  5. https://brcwebportal.cos.ncsu.edu/thorne/ftp_docs/bioinf2/sampling2023.pdfopen in new window ↩︎