Faster Longest Common Extension Queries in Strings over General Alphabets

Abstract

Longest common extension queries (often called longest common prefix queries) constitute a fundamental building block in multiple string algorithms, for example computing runs and approximate pattern matching. We show that a sequence of q LCE queries for a string of size n over a general ordered alphabet can be realized in O(q n+n*n) time making only O(q+n) symbol comparisons. Consequently, all runs in a string over a general ordered alphabet can be computed in O(n n) time making O(n) symbol comparisons. Our results improve upon a solution by Kosolobov (Information Processing Letters, 2016), who gave an algorithm with O(n 2/3 n) running time and conjectured that O(n) time is possible. We make a significant progress towards resolving this conjecture. Our techniques extend to the case of general unordered alphabets, when the time increases to O(q n + n*n). The main tools are difference covers and the disjoint-sets data structure.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…