Fast Approximate CoSimRanks via Random Projections
Abstract
Given a graph G with n nodes and two nodes u,v∈ G, the CoSimRank value s(u,v) quantifies the similarity between u and v based on graph topology. Compared to SimRank, CoSimRank is shown to be more accurate and effective in many real-world applications, including synonym expansion, lexicon extraction, and entity relatedness in knowledge graphs. The computation of all pairwise CoSimRanks in G is highly expensive and challenging. Existing solutions all focus on devising approximate algorithms for the computation of all pairwise CoSimRanks. To attain a desired absolute accuracy guarantee ε, the state-of-the-art approximate algorithm for computing all pairwise CoSimRanks requires O(n32((1ε))) time, which is prohibitively expensive even though ε is large. In this paper, we propose , a fast randomized algorithm for computing all pairwise CoSimRank values. The basic idea of is to approximate the n× n matrix multiplications in CoSimRank computation via random projection. Theoretically, runs in O(n2(n)ε2(1ε)) time and meanwhile ensures an absolute error of at most ε in each CoSimRank value in G with a high probability. Extensive experiments using six real graphs demonstrate that is more than orders of magnitude faster than the state of the art. In particular, on a million-edge Twitter graph, answers the ε-approximate (ε=0.1) all pairwise CoSimRank query within 4 hours, using a single commodity server, while existing solutions fail to terminate within a day.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.