The distributions under two species-tree models of the total number of ancestral configurations for matching gene trees and species trees
Abstract
Given a gene-tree labeled topology G and a species tree S, the "ancestral configurations" at an internal node k of S represent the combinatorially different sets of gene lineages that can be present at k when all possible realizations of G in S are considered. Ancestral configurations have been introduced as a data structure for evaluating the conditional probability of a gene-tree labeled topology given a species tree, and their enumeration assists in describing the complexity of this computation. In the case that the gene-tree labeled topology G=t matches that of the species tree S, by techniques of analytic combinatorics, we study distributional properties of the "total" number of ancestral configurations measured across the different nodes of a random labeled topology t selected under the uniform and the Yule probability models. Under both of these probabilistic scenarios, we show that the total number Tn of ancestral configurations of a random labeled topology of n taxa asymptotically follows a lognormal distribution. Over uniformly distributed labeled topologies, the asymptotic growth of the mean and the variance of Tn are found to satisfy E U[Tn] 2.449 · 1.333n and V U[Tn] 5.050 · 1.822n, respectively. Under the Yule model, which assigns higher probabilities to more balanced labeled topologies, we obtain the mean E Y[Tn] 1.425n and the variance V Y[Tn] 2.045n.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.