An invertible transform for efficient string matching in labeled digraphs

Abstract

Let G = (V, E) be a digraph where each vertex is unlabeled, each edge is labeled by a character in some alphabet , and any two edges with both the same head and the same tail have different labels. The powerset construction gives a transform of G into a weakly connected digraph G' = (V', E') that enables solving the decision problem of whether there is a walk in G matching an arbitrarily long query string q in time linear in |q| and independent of |E| and |V|. We show G is uniquely determined by G' when for every v ∈ V, there is some distinct string s on such that v is the origin of a closed walk in G matching s, and no other walk in G matches s unless it starts and ends at v. We then exploit this invertibility condition to strategically alter any G so its transform G' enables retrieval of all t terminal vertices of walks in the unaltered G matching q in O(|q| + t |V|) time. We conclude by proposing two defining properties of a class of transforms that includes the Burrows-Wheeler transform and the transform presented here.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…