Principles and Practices of Large-Scale Code Analysis at Ant Group: A Data- and Logic-Oriented Approach

Abstract

Large-scale software development requires dynamic and multifaceted static code analysis that extends beyond the capabilities of traditional tools. Existing tools like CodeQL lack cross-language analysis capabilities and can be time-consuming and resource-intensive. We present CodeFuse-Query, a data system tailored for large-scale code analysis. First, CodeFuse-Query adopts a Logic-Oriented Computation Design, employing Datalog with a two-tiered schema, COREF, to convert source code into data facts, and Godel to express complex analysis tasks in logical terms. Furthermore, CodeFuse-Query adopts a Domain-Optimized System Design. This approach optimizes resource utilization, prioritizes data reusability, applies incremental code extraction, and introduces task-type characteristics specifically for code changes, underscoring its domain-optimized design. We present empirical results demonstrating CodeFuse-Query's robustness, scalability, and efficiency in large-scale real-world scenarios at Ant Group, where it serves as a core static analysis infrastructure. Deployed in production environments, CodeFuse-Query processes up to 10 billion lines of code daily across more than 300,000 distinct analysis tasks. CodeFuse-Query has been open-sourced.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…