A weighted angle distance on strings

Abstract

We define a multi-scale metric d on strings by aggregating angle distances between all n-gram count vectors with exponential weights n. We benchmark d in DBSCAN clustering against edit and n-gram baselines, give a linear-time suffix-tree algorithm for evaluation, prove metric and stability properties (including robustness under tandem-repeat stutters), and characterize isometries.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…