External memory bisimulation reduction of big graphs

Abstract

In this paper, we present, to our knowledge, the first known I/O efficient solutions for computing the k-bisimulation partition of a massive directed graph, and performing maintenance of such a partition upon updates to the underlying graph. Ubiquitous in the theory and application of graph data, bisimulation is a robust notion of node equivalence which intuitively groups together nodes in a graph which share fundamental structural features. k-bisimulation is the standard variant of bisimulation where the topological features of nodes are only considered within a local neighborhood of radius k≥slant 0. The I/O cost of our partition construction algorithm is bounded by O(k· sort(||) + k· scan(||) + sort(||)), while our maintenance algorithms are bounded by O(k· sort(||) + k· sort(||)). The space complexity bounds are O(||+||) and O(k·||+k·||), resp. Here, || and || are the number of disk pages occupied by the input graph's edge set and node set, resp., and sort(n) and scan(n) are the cost of sorting and scanning, resp., a file occupying n pages in external memory. Empirical analysis on a variety of massive real-world and synthetic graph datasets shows that our algorithms perform efficiently in practice, scaling gracefully as graphs grow in size.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…