POIFormer: A Transformer-Based Framework for Accurate and Scalable Point-of-Interest Attribution
Abstract
Accurately attributing user visits to specific Points of Interest (POIs) is a foundational task for mobility analytics, personalized services, marketing and urban planning. However, POI attribution remains challenging due to GPS inaccuracies, typically ranging from 2 to 20 meters in real-world settings, and the high spatial density of POIs in urban environments, where multiple venues can coexist within a small radius (e.g., over 50 POIs within a 100-meter radius in dense city centers). Relying on proximity is therefore often insufficient for determining which POI was actually visited. We introduce POIFormer, a novel Transformer-based framework for accurate and efficient POI attribution. Unlike prior approaches that rely on limited spatiotemporal, contextual, or behavioral features, POIFormer jointly models a rich set of signals, including spatial proximity, visit timing and duration, contextual features from POI semantics, and behavioral features from user mobility and aggregated crowd behavior patterns--using the Transformer's self-attention mechanism to jointly model complex interactions across these dimensions. By leveraging the Transformer to model a user's past and future visits (with the current visit masked) and incorporating crowd-level behavioral patterns through pre-computed KDEs, POIFormer enables accurate, efficient attribution in large, noisy mobility datasets. Its architecture supports generalization across diverse data sources and geographic contexts while avoiding reliance on hard-to-access or unavailable data layers, making it practical for real-world deployment. Extensive experiments on real-world mobility datasets demonstrate significant improvements over existing baselines, particularly in challenging real-world settings characterized by spatial noise and dense POI clustering.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.