Bounds on the Number of Huffman and Binary-Ternary Trees
Abstract
Huffman coding is a widely used method for lossless data compression because it optimally stores data based on how often the characters occur in Huffman trees. An n-ary Huffman tree is a connected, cycle-lacking graph where each vertex can have either n "children" vertices connecting to it, or 0 children. Vertices with 0 children are called leaves. We let hn(q) represent the total number of n-ary Huffman trees with q leaves. In this paper, we use a recursive method to generate upper and lower bounds on hn(q) and get h2(q) ≈ (0.1418532)(1.7941471)q+(0.0612410)(1.2795491)q for n=2. This matches the best results achieved by Elsholtz, Heuberger, and Prodinger in August 2011. Our approach reveals patterns in Huffman trees that we used in our analysis of the Binary-Ternary (BT) trees we created. Our research opens a completely new door in data compression by extending the study of Huffman trees to BT trees. Our study of BT trees paves the way for designing data-specific trees, minimizing possible wasted storage space from Huffman coding. We prove a recursive formula for the number of BT trees with q leaves. Furthermore, we provide analysis and further proofs to reach numeric bounds. Our discoveries have broad applications in computer data compression. These results also improve graphical representations of protein sequences that facilitate in-depth genome analysis used in researching evolutionary patterns.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.