Distance-based species tree estimation: information-theoretic trade-off between number of loci and sequence length under the coalescent
Abstract
We consider the reconstruction of a phylogeny from multiple genes under the multispecies coalescent. We establish a connection with the sparse signal detection problem, where one seeks to distinguish between a distribution and a mixture of the distribution and a sparse signal. Using this connection, we derive an information-theoretic trade-off between the number of genes, m, needed for an accurate reconstruction and the sequence length, k, of the genes. Specifically, we show that to detect a branch of length f, one needs m = (1/[f2 k]).
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.