REGREACT: Self-Correcting Multi-Agent Pipelines for Structured Regulatory Information Extraction
Abstract
Extracting structured, machine-readable compliance criteria from regulatory documents remains an open challenge. Single-pass language models hallucinate structural elements, lose hierarchical relationships, and fail to resolve inter-document dependencies. We introduce RegReAct, a self-correcting multi-agent framework that decomposes regulatory information extraction into seven specialized stages, each with an Observe--Diagnose--Repair (ODR) loop that validates outputs against the source, correcting not only model hallucinations but also cross-reference errors in the regulations themselves. To ensure structural accuracy, RegReAct constructs a typed criterion graph; to ensure completeness, it resolves external dependencies by retrieving, summarizing, and embedding referenced legal content inline, producing self-contained outputs. Applying RegReAct to three EU Taxonomy Delegated Acts, we construct a dataset comprising 242 activities with over 4,800 hierarchical criteria, thresholds, and enriched source summaries. Evaluation against a GPT-4o single-pass baseline confirms that RegReAct outperforms it across all structural and semantic metrics. Code and data will be made publicly available: https://github.com/RECOR-Benchmark/RECOR
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.