Corpus Prevalence of Multiple-Choice Question Options

Abstract

In recent years, corpus-driven AI methods, such as Large Language Models (LLMs), have seen widespread use in education. While on the surface their abilities look promising for tasks ranging from generating assessment materials to simulating student performance, we should be aware of the subtle nuances of their frequentist nature that might be affecting their behaviour. In this work, we focus on the aspect of corpus frequency in the context of creating high-quality Multiple Choice Questions (MCQs), specifically asking: What if corpus prevalence were enough to identify the correct answer to an MCQ? We propose a computational method of assessing corpus prevalence of MCQ options in large text corpora leveraging textual embeddings using both expert- and machine-generated MCQ sets. The key finding, across three large question sets, is that correct answers, independently of the question stem, are significantly more available than incorrect options. Specifically, using Wikipedia as the retrieval corpus, we find that always selecting the most prevalent option leads to scores up to 9.0% above the random-guess baseline. We also find that MCQ distractors generated by LLMs often show similar patterns of prevalence compared to expert-created options, despite the LLMs' frequentist nature and their training on large collections of textual data. Moreover, we find that corpus prevalence does not necessarily correlate with how recognisable terms are to humans. This highlights the need to better understand how corpora are used in AI-driven methods for education, whether applied directly or indirectly via LLMs.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…