Partial-Match Queries with Random Wildcards: In Tries and Distributed Hash Tables
Abstract
Consider an m-bit query q to a bitwise trie T. A wildcard * is an unspecified bit in q for which the query asks the membership for both cases *=0 and *=1. It is common that such partial-match queries with wildcards are issued in tries. With uniformly random occurrences of w wildcards in q assumed, the obvious upper bound on the average number of traversal steps in T is 2w m. We show that the average does not exceed \[ m+1w+1 ( 2w+2 - 2 w - 4 ) + m = O ( 2w mw ), \] and equals the value exactly when T includes all the m-bit keys as the worst case. Here the query q performs with the naive backtracking algorithm in T. It is similarly shown that the average is O ( kw mw ) in a general trie of maximum out-degree k. Our analysis for tries is extended to a distributed hash table (DHT), which is among the most frequently used decentralized data structures in networking. We show, under a natural probabilistic assumption for the largest class of DHTs, that the average number of hops required by an m-bit query q to a DHT D with random w wildcards meets the same asymptotic bound. As a result, q is answered with average O ( 2w mw ) hops rather than ( 2w m ) in the four major DHTs Chord, Pastry, Tapestry and Kademlia. In addition, with a uniform key distribution for sufficiently many entries, we prove that a lookup request to the DHT Chord is answered correctly with O(m) hops and probability 1 - 2- (m). To the author's knowledge, the probability 1 - 2- (m) of correct lookup in Chord has not been identified so far.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.