Display options
Share it on

PLoS Curr. 2012 Dec 14;4:e4fd1286980c08. doi: 10.1371/4fd1286980c08.

An Algorithm for Calculating the Probability of Classes of Data Patterns on a Genealogy.

PLoS currents

Jordan M Koch, Mark T Holder

Affiliations

  1. University of Kansas.
  2. Department of Ecology and Evolutionary Biology, University of Kansas.

PMID: 23868168 PMCID: PMC3712476 DOI: 10.1371/4fd1286980c08

Abstract

Felsenstein's pruning algorithm allows one to calculate the probability of any particular data pattern arising on a phylogeny given a model of character evolution. Here we present a similar dynamic programming algorithm. Our algorithm treats the tree and model as known. The algorithm makes it feasible to calculate the probability that a randomly selected character will be a member of a particular class of character patterns. Specifically, we are interested in binning patterns by the number of parsimony steps and the set of states observed at the tips of the tree. This algorithm was developed to expand the range of data set sizes that can be used with Waddell et al.'s marginal testing approach for assessing the adequacy of a model. The algorithms introduced can also be used in likelihood calculations which correct for ascertainment biases. For example, Lewis introduced an Mkv model which corrects for the lack of constant sites. The probability of a constant pattern arising can be calculated using the algorithm that we present, or by enumerating all possible constant patterns and calculating the probability of each one. Because the number of constant data patterns is small, both methods are efficient. However, elaborations of the Mkv model (such as those in Nylander et al) require calculating the probability of parsimony-uninformative patterns arising. For large trees and characters with many possible character states, the number of possible parismony-uninformative patterns is immense. In these cases, the algorithms introduced here will be more efficient. The algorithm has been implemented in open source software written in C++.

References

  1. J Mol Evol. 1981;17(6):368-76 - PubMed
  2. Syst Biol. 2012 Jan;61(1):170-3 - PubMed
  3. J Mol Evol. 2009 Oct;69(4):289-99 - PubMed
  4. J Mol Evol. 1993 Feb;36(2):182-98 - PubMed
  5. Evolution. 1992 Feb;46(1):159-173 - PubMed
  6. Bioinformatics. 2003 Nov 22;19(17):2330-1 - PubMed
  7. J Mol Evol. 1992 Jul;35(1):17-31 - PubMed
  8. Syst Biol. 2004 Feb;53(1):47-67 - PubMed
  9. Syst Zool. 1970 Jun;19(2):99-113 - PubMed
  10. J Mol Evol. 1993 Dec;37(6):650-61 - PubMed

Publication Types

Grant support