S3GND: An Effective Learning-Based Approach for Subgraph Similarity Search Under Generalized Neighbor Difference Semantics (Technical Report)
Abstract
Subgraph similarity search over large-scale graphs is a fundamental task that retrieves subgraphs similar to a given query graph from a data graph, and it plays a crucial role in real applications such as protein discovery, social network analysis, and recommendation systems. While prior works on subgraph similarity search studied various graph similarity metrics, in this paper, we propose a novel graph similarity semantics, generalized neighbor difference (GND), that accounts for both the keyword-set relationships between vertices and edge-weight differences. We formulate the problem of subgraph similarity search under the generalized neighbor difference semantics (S3GND), which retrieves those subgraphs similar to a query graph q under GND semantics. To efficiently tackle the S3GND problem, we propose an effective learning-based approach, which constructs a keyword hypergraph from the data graph, and trains a hypergraph neural network (HGNN) model to obtain high-quality keyword embedding representations. We design effective pruning strategies, keyword embedding MBR, vertex-Level ND lower bound, and graph-level GND lower bound pruning, to rule out false alarms of candidate vertices/subgraphs, and devise a tree-based indexing mechanism to facilitate efficient S3GND query answering. We develop an efficient S3GND query-processing algorithm that traverses the index, applies pruning strategies, and returns actual S3GND answers. Finally, we conduct extensive experiments to verify the effectiveness and efficiency of our proposed S3GND approach over both real and synthetic graphs.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.