HSGM: Hierarchical Segment-Graph Memory for Scalable Long-Text Semantics
Abstract
Semantic parsing of long documents remains challenging due to quadratic growth in pairwise composition and memory requirements. We introduce Hierarchical Segment-Graph Memory (HSGM), a novel framework that decomposes an input of length N into M meaningful segments, constructs Local Semantic Graphs on each segment, and extracts compact summary nodes to form a Global Graph Memory. HSGM supports incremental updates -- only newly arrived segments incur local graph construction and summary-node integration -- while Hierarchical Query Processing locates relevant segments via top-K retrieval over summary nodes and then performs fine-grained reasoning within their local graphs. Theoretically, HSGM reduces worst-case complexity from O(N2) to O\!(N\,k + (N/k)2), with segment size k N, and we derive Frobenius-norm bounds on the approximation error introduced by node summarization and sparsification thresholds. Empirically, on three benchmarks -- long-document AMR parsing, segment-level semantic role labeling (OntoNotes), and legal event extraction -- HSGM achieves 2--4× inference speedup, >60\% reduction in peak memory, and 95\% of baseline accuracy. Our approach unlocks scalable, accurate semantic modeling for ultra-long texts, enabling real-time and resource-constrained NLP applications.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.