On the inference of large phylogenies with long branches: How long is too long?

Abstract

Recent work has highlighted deep connections between sequence-length requirements for high-probability phylogeny reconstruction and the related problem of the estimation of ancestral sequences. In [Daskalakis et al.'09], building on the work of [Mossel'04], a tight sequence-length requirement was obtained for the CFN model. In particular the required sequence length for high-probability reconstruction was shown to undergo a sharp transition (from O( n) to poly(n), where n is the number of leaves) at the "critical" branch length (if it exists) of the ancestral reconstruction problem. Here we consider the GTR model. For this model, recent results of [Roch'09] show that the tree can be accurately reconstructed with sequences of length O((n)) when the branch lengths are below , known as the Kesten-Stigum (KS) bound. Although for the CFN model = , it is known that for the more general GTR models one has ≥ with a strict inequality in many cases. Here, we show that this phenomenon also holds for phylogenetic reconstruction by exhibiting a family of symmetric models Q and a phylogenetic reconstruction algorithm which recovers the tree from O( n)-length sequences for some branch lengths in the range (,). Second we prove that phylogenetic reconstruction under GTR models requires a polynomial sequence-length for branch lengths above .

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…