Pointer: Linear-Complexity Long-Range Modeling without Pre-training

Zixi Li

Pointer: Linear-Complexity Long-Range Modeling without Pre-training

Abstract

We introduce Pointer, a novel architecture that achieves linear O(NK) complexity for long-range sequence modeling while maintaining superior performance without requiring pre-training. Unlike standard attention mechanisms that compute O(N2) pairwise interactions, our approach uses layer-wise pointer chaining where each layer's pointer selection depends on previous layer's pointer positions, creating explicit long-distance connections through pointer chains. We demonstrate that this architecture achieves 2--10× speedup on long sequences compared to standard transformers, maintains >95\% accuracy on copy tasks at distances up to 2048 tokens, and learns interpretable pointer patterns that reveal structured dependency modeling. Our experiments on efficiency benchmarks, long-range dependency tasks, and interpretability analysis show that Pointer offers a compelling alternative to attention mechanisms for scenarios requiring efficient long-range modeling without pre-training dependencies.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…