Efficient tree-structured categorical retrieval
Abstract
We study a document retrieval problem in the new framework where D text documents are organized in a category tree with a pre-defined number h of categories. This situation occurs e.g. with taxomonic trees in biology or subject classification systems for scientific literature. Given a string pattern p and a category (level in the category tree), we wish to efficiently retrieve the t categorical units containing this pattern and belonging to the category. We propose several efficient solutions for this problem. One of them uses n(σ(1+o(1))+ D+O(h)) + O() bits of space and O(|p|+t) query time, where n is the total length of the documents, σ the size of the alphabet used in the documents and is the total number of nodes in the category tree. Another solution uses n(σ(1+o(1))+O( D))+O()+O(D n) bits of space and O(|p|+t D) query time. We finally propose other solutions which are more space-efficient at the expense of a slight increase in query time.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.