Defining binary phylogenetic trees using parsimony: new bounds
Abstract
Phylogenetic trees are frequently used to model evolution. Such trees are typically reconstructed from data like DNA, RNA, or protein alignments using methods based on criteria like maximum parsimony (amongst others). Maximum parsimony has been assumed to work well for data with only few state changes. Recently, some progress has been made to formally prove this assertion. For instance, it has been shown that each binary phylogenetic tree T with n ≥ 20k leaves is uniquely defined by the set Ak(T), which consists of all characters with parsimony score k on T. In the present manuscript, we show that the statement indeed holds for all n ≥ 4k, thus drastically lowering the lower bound for n from 20k to 4k. However, it has been known that for n ≤ 2k and k ≥ 3, it is not generally true that Ak(T) defines T. We improve this result by showing that the latter statement can be extended from n ≤ 2k to n ≤ 2k+2. So we drastically reduce the gap of values of n for which it is unknown if trees T on n taxa are defined by Ak(T) from the previous interval of [2k+1,20k-1] to the interval [2k+3,4k-1]. Moreover, we close this gap completely for the nearest neighbor interchange (NNI) neighborhood of T in the following sense: We show that as long as n≥ 2k+3, no tree that is one NNI move away from T (and thus very similar to T) shares the same Ak-alignment.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.