Aggregated Individual Reporting for Post-Deployment Evaluation

Abstract

The need for developing model evaluations beyond static benchmarking, especially in the post-deployment phase, is now well-understood. At the same time, concerns about the concentration of power in deployed AI systems have sparked a keen interest in 'democratic' or 'public' AI. In this work, we bring these two ideas together by proposing mechanisms for aggregated individual reporting (AIR), a framework for post-deployment evaluation that relies on individual reports from the public. An AIR mechanism allows those who interact with a specific, deployed (AI) system to report when they feel that they may have experienced something problematic; these reports are then aggregated over time, with the goal of evaluating the relevant system in a fine-grained manner. This position paper argues that individual experiences should be understood as an integral part of post-deployment evaluation, and that the scope of our proposed aggregated individual reporting mechanism is a practical path to that end. On the one hand, individual reporting can identify substantively novel insights about safety and performance; on the other, aggregation can be uniquely useful for informing action. From a normative perspective, the post-deployment phase completes a missing piece in the conversation about 'democratic' AI. As a pathway to implementation, we provide a workflow of concrete design decisions and pointers to areas requiring further research and methodological development.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…