Sequence Graph Transform (SGT): A Feature Embedding Function for Sequence Data Mining

Abstract

Sequence feature embedding is a challenging task due to the unstructuredness of sequence, i.e., arbitrary strings of arbitrary length. Existing methods are efficient in extracting short-term dependencies but typically suffer from computation issues for the long-term. Sequence Graph Transform (SGT), a feature embedding function, that can extract a varying amount of short- to long-term dependencies without increasing the computation is proposed. SGT's properties are analytically proved for interpretation under normal and uniform distribution assumptions. SGT features yield significantly superior results in sequence clustering and classification with higher accuracy and lower computation as compared to the existing methods, including the state-of-the-art sequence/string Kernels and LSTM.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…