Partial Identification from LLM Prompts

Abstract

Large language models are increasingly used as binary classifiers when the true label is latent. We study partial identification of the prevalence θ= P(X* = 1) from panels of LLM reports whose errors may be arbitrarily dependent given the truth. The design of replication determines the observable, and hence the identifying content: repeated prompts to one model yield a count, several named models a response vector, and both a response matrix. Cast as a two-component finite mixture, the problem makes the identification failure transparent: absent restrictions that separate the latent components, the prevalence θ is completely unidentified, and weak stochastic-ordering restrictions (first-order dominance, monotone likelihood ratio, mean ordering) leave the identified set at [0,1]. Identifying power comes instead from externally calibrated scores and events, which discipline the mixture in the spirit of the misclassification and corrupted-data literature. We characterize the resulting bounds, establishing validity and sharpness, and give an exact account of the identifying information in the full score distribution beyond its mean. When named models are asked repeated versions of the same question, what identifies θ is not the number of positive answers but which models agree across prompts -- a feature a vote count discards. An extension derives implied bounds on regression coefficients when X* is a regressor of interest that is not directly observed.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…