Leveraging Large Language Models for Automated Reproduction of Networking Research Results
Abstract
Code reproduction is a cornerstone of scientific validity, yet it remains a formidable challenge in computer networking research due to the scarcity of open-source implementations and the complexity of heterogeneous system architectures. While Large Language Models have demonstrated potential in code generation, existing code generation frameworks often fail to address the long-context constraints and intricate logical dependencies required to reproduce network systems from academic papers. To facilitate result reproduction, we introduce RepLLM, an end-to-end multi-agent framework designed to automate the transformation of network research into executable code. RepLLM features a novel collaborative architecture comprising four specialized agents -- Content Parsing, Architecture Design, Code Generation, and Audit \& Repair -- coordinated through an explicit Shared Memory mechanism to ensure global context consistency. With the enhancement of Chain-of-Thought LLM reasoning and a sandbox-isolated static-dynamic debugging methodology, our framework effectively resolves semantic discrepancies and runtime errors. Extensive evaluations on representative papers from SIGCOMM and NSDI demonstrate that RepLLM significantly outperforms state-of-the-art baselines in generating compile-ready and logically correct systems. Results further demonstrate that RepLLM facilitates the reproduction of 80\% of the original benchmarks with only four hours of human intervention.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.