Impact of Shallow vs. Deep Relevance Judgments on BERT-based Reranking Models
Abstract
This paper investigates the impact of shallow versus deep relevance judgments on the performance of BERT-based reranking models in neural Information Retrieval. Shallow-judged datasets, characterized by numerous queries each with few relevance judgments, and deep-judged datasets, involving fewer queries with extensive relevance judgments, are compared. The research assesses how these datasets affect the performance of BERT-based reranking models trained on them. The experiments are run on the MS MARCO and LongEval collections. Results indicate that shallow-judged datasets generally enhance generalization and effectiveness of reranking models due to a broader range of available contexts. The disadvantage of the deep-judged datasets might be mitigated by a larger number of negative training examples.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.