Split probabilities and species tree inference under the multispecies coalescent model
Abstract
Using topological summaries of gene trees as a basis for species tree inference is a promising approach to obtain acceptable speed on genomic-scale datasets, and to avoid some undesirable modeling assumptions. Here we study the probabilities of splits on gene trees under the multispecies coalescent model, and how their features might inform species tree inference. After investigating the behavior of split consensus methods, we investigate split invariants --- that is, polynomial relationships between split probabilities. These invariants are then used to show that, even though a split is an unrooted notion, split probabilities retain enough information to identify the rooted species tree topology for trees of more than 5 taxa, with one possible 6-taxon exception.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.