Fast Spatial Autocorrelation
Abstract
Physical or geographic location proves to be an important feature in many data science models, because many diverse natural and social phenomenon have a spatial component. Spatial autocorrelation measures the extent to which locally adjacent observations of the same phenomenon are correlated. Although statistics like Moran's I and Geary's C are widely used to measure spatial autocorrelation, they are slow: all popular methods run in (n2) time, rendering them unusable for large data sets, or long time-courses with moderate numbers of points. We propose a new SA statistic based on the notion that the variance observed when merging pairs of nearby clusters should increase slowly for spatially autocorrelated variables. We give a linear-time algorithm to calculate SA for a variable with an input agglomeration order (available at https://github.com/aamgalan/spatialautocorrelation). For a typical dataset of n ≈ 63,000 points, our SA autocorrelation measure can be computed in 1 second, versus 2 hours or more for Moran's I and Geary's C. Through simulation studies, we demonstrate that SA identifies spatial correlations in variables generated with spatially-dependent model half an order of magnitude earlier than either Moran's I or Geary's C. Finally, we prove several theoretical properties of SA: namely that it behaves as a true correlation statistic, and is invariant under addition or multiplication by a constant.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.