SWAN: A Generic Framework for Auditing Textual Conversational Systems

Abstract

We present a simple and generic framework for auditing a given textual conversational system, given some samples of its conversation sessions as its input. The framework computes a SWAN (Schematised Weighted Average Nugget) score based on nugget sequences extracted from the conversation sessions. Following the approaches of S-measure and U-measure, SWAN utilises nugget positions within the conversations to weight the nuggets based on a user model. We also present a schema of twenty (+1) criteria that may be worth incorporating in the SWAN framework. In our future work, we plan to devise conversation sampling methods that are suitable for the various criteria, construct seed user turns for comparing multiple systems, and validate specific instances of SWAN for the purpose of preventing negative impacts of conversational systems on users and society. This paper was written while preparing for the ICTIR 2023 keynote (to be given on July 23, 2023).

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…