AMAQA: A Metadata-based QA Dataset for RAG Systems

Abstract

Retrieval-augmented generation (RAG) systems are widely used in question-answering (QA) tasks, but current benchmarks lack metadata integration, limiting their evaluation in scenarios requiring both textual data and external information. To address this, we present AMAQA, a new open-access QA dataset designed to evaluate tasks combining text and metadata. The integration of metadata is especially important in fields that require rapid analysis of large volumes of data, such as cybersecurity and intelligence, where timely access to relevant information is critical. AMAQA includes about 1.1 million English messages collected from 26 public Telegram groups, enriched with metadata such as timestamps and chat names. It also contains 20,000 hotel reviews with metadata. In addition, the dataset provides 2,600 high-quality QA pairs built across both domains, Telegram messages and hotel reviews, making AMAQA a valuable resource for advancing research on metadata-driven QA and RAG systems. Both Telegram messages and Hotel reviews are enriched with emotional tones or toxicity indicators. To the best of our knowledge, AMAQA is the first single-hop QA benchmark to incorporate metadata. We conduct extensive tests on the benchmark, setting a new reference point for future research. We show that leveraging metadata boosts accuracy from 0.5 to 0.86 for GPT-4o and from 0.27 to 0.76 for open source LLMs, highlighting the value of structured context. We conducted experiments on our benchmark to assess the performance of known techniques designed to enhance RAG, highlighting the importance of properly managing metadata throughout the entire RAG pipeline.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…