Compressibility Barriers to Neighborhood-Preserving Data Visualizations
Abstract
To what extent is it possible to visualize high-dimensional data in two- or three-dimensional plots? We reframe this question in terms of embedding n-vertex graphs (representing the neighborhood structure of the input points) into metric spaces of low doubling dimension d in such a way that keeps neighbors close and non-neighbors far. This notion of neighbor preservation can be understood as a considerably weaker embedding constraint than near-isometry, yet it is similarly as demanding in terms of how the minimum required dimension scales with the number of points. We show that for an overwhelming fraction of graphs, d = ( n) is both necessary and sufficient for neighbor preservation. Even sparse regular graphs, which represent more restricted neighborhood connectivity structures, typically require d= ( n / n). The landscape changes dramatically when embedding into normed spaces: general graphs become exponentially harder to embed, requiring d=(n), while sparse regular graphs continue to admit d = O( n). Finally, we study the implications of these results for visualizing data with intrinsic cluster structure. We show that graphs produced from a planted partition model with k clusters on n points typically require d=( n), even when the cluster structure is salient. These results challenge the aspiration that constant-dimensional visualizations can faithfully preserve neighborhood structure.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.