Towards an Optimal Space-and-Query-Time Index for Top-k Document Retrieval

Abstract

Let = \d1,d2,...dD\ be a given set of D string documents of total length n, our task is to index , such that the k most relevant documents for an online query pattern P of length p can be retrieved efficiently. We propose an index of size |CSA|+n D(2+o(1)) bits and O(ts(p)+k n+poly n) query time for the basic relevance metric term-frequency, where |CSA| is the size (in bits) of a compressed full text index of , with O(ts(p)) time for searching a pattern of length p . We further reduce the space to |CSA|+n D(1+o(1)) bits, however the query time will be O(ts(p)+k( σ n)1+ε+poly n), where σ is the alphabet size and ε >0 is any constant.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…