Almost Optimal Bounds for Sublinear-Time Sampling of k-Cliques: Sampling Cliques is Harder Than Counting
Abstract
In this work, we consider the problem of sampling a k-clique in a graph from an almost uniform distribution in sublinear time in the general graph query model. Specifically the algorithm should output each k-clique with probability (1 ε)/nk, where nk denotes the number of k-cliques in the graph and ε is a given approximation parameter. We prove that the query complexity of this problem is \[ *(\ ((nα)k/2 nk)1k-1 ,\; \nα,nαk-1nk \\). \] where n is the number of vertices in the graph, α is its arboricity, and * suppresses the dependence on ( n/ε)O(k). Interestingly, this establishes a separation between approximate counting and approximate uniform sampling in the sublinear regime. For example, if k=3, α = O(1), and n3 (the number of triangles) is (n), then we get a lower bound of (n1/4) (for constant ε), while under these conditions, a (1 ε)-approximation of n3 can be obtained by performing poly((n/ε)) queries (Eden, Ron and Seshadhri, SODA20). Our lower bound follows from a construction of a family of graphs with arboricity α such that in each graph there are nk cliques (of size k), where one of these cliques is "hidden" and hence hard to sample. Our upper bound is based on defining a special auxiliary graph Hk, such that sampling edges almost uniformly in Hk translates to sampling k-cliques almost uniformly in the original graph G. We then build on a known edge-sampling algorithm (Eden, Ron and Rosenbaum, ICALP19) to sample edges in Hk, where the challenge is simulate queries to Hk while being given access only to G.