Output-Optimal Algorithms for Join-Aggregate Queries

Abstract

One of the most celebrated results of computing join-aggregate queries defined over commutative semi-rings is the classic Yannakakis algorithm proposed in 1981. It is known that the runtime of the Yannakakis algorithm is O(N + ) for any free-connex query, where N is the input size of the database and is the output size of the query result. This is already output-optimal. However, only an upper bound O(N · ) on the runtime is known for the large remaining class of acyclic but non-free-connex queries. Alternatively, one can convert a non-free-connex query into a free-connex one using tree decomposition techniques and then run the Yannakakis algorithm. This approach takes O(N\# + ) time, where \# is the free-connex sub-modular width of the input query. But, none of these results is known to be output-optimal. In this paper, we show a matching lower and upper bound (N · 1- 1 + ) for computing general acyclic join-aggregate queries by semiring algorithms, where is the free-connex fractional hypertree width of the query. For example, =1 for free-connex queries, =2 for line queries (a.k.a. chain matrix multiplication), and =k for star queries (a.k.a. star matrix multiplication) with k relations. While this measure has been defined before, we are the first to use it to characterize the output-optimal complexity of acyclic join-aggregate queries. To our knowledge, this has been the first polynomial improvement over the Yannakakis algorithm in the last 40 years and completely resolves the open question of an output-optimal algorithm for computing acyclic join-aggregate queries.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…