A weighted angle distance on strings
Abstract
We define a multi-scale metric d on strings by aggregating angle distances between all n-gram count vectors with exponential weights n. We benchmark d in DBSCAN clustering against edit and n-gram baselines, give a linear-time suffix-tree algorithm for evaluation, prove metric and stability properties (including robustness under tandem-repeat stutters), and characterize isometries.
0
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.