A systematic framework for generating novel experimental hypotheses from language models
Abstract
Neural language models (LMs) have been shown to capture complex linguistic patterns, yet their utility in understanding human language and more broadly, human cognition, remains debated. While existing work in this area often evaluates human-machine alignment, few studies attempt to translate findings from this enterprise into novel insights about humans. To this end, we propose a systematic framework for hypothesis generation that uses LMs to simulate outcomes of experiments that do not yet exist in the literature. We instantiate this framework in the context of a specific research question in child language development: dative verb acquisition and cross-structural generalization. Through this instantiation, we derive novel, untested hypotheses: the alignment between argument ordering and discourse prominence features of exposure contexts modulates how children generalize new verbs to unobserved structures. Additionally, we also design a set of experiments that can test these hypotheses in the lab with children. This work contributes both a domain-general framework for systematic hypothesis generation via simulated learners and domain-specific, lab-testable hypotheses for child language acquisition research.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.