An Explicit O(r r) Threshold for Attaining the Semple--Steel Bound with r-State Characters
Abstract
Let dr(n) be the maximum, over all binary phylogenetic trees with n leaves, of the minimum number of r-state characters required to define the tree. Semple and Steel proved that dr(n)≥(n-3)/(r-1), and Bordewich and Semple proved that equality holds for each fixed r and all sufficiently large n. We study the corresponding threshold nr, the least N for which equality holds for every n≥ N. The Bordewich--Semple construction yields an explicit polynomial upper bound of order O(r5) for this threshold. We prove the near-linear estimate \[ 3r+1≤ nr≤ 64(r-1)2(r+1)+3(r≥4). \] The proof constructs, for every binary phylogenetic tree with m=n-3 internal edges, a linked quartet certificate whose conflict graph has maximum degree at most 162(m+2)+4. Equitable coloring then packs the certificate into exactly m/(r-1) r-state characters once m≥64(r-1)2(r+1). We also include the lower bound nr≥3r+1, obtained from the snowflake obstruction, and state the natural conjecture that this lower bound is the exact threshold for all r≥4. The conjectural endpoint is consistent with the known small-state thresholds: n4=13 and n5=16, while the cases r=2,3 are also explicitly classified.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.