Cross-Lingual Speaker Identification Using Distant Supervision

Abstract

Speaker identification, determining which character said each utterance in literary text, benefits many downstream tasks. Most existing approaches use expert-defined rules or rule-based features to directly approach this task, but these approaches come with significant drawbacks, such as lack of contextual reasoning and poor cross-lingual generalization. In this work, we propose a speaker identification framework that addresses these issues. We first extract large-scale distant supervision signals in English via general-purpose tools and heuristics, and then apply these weakly-labeled instances with a focus on encouraging contextual reasoning to train a cross-lingual language model. We show that the resulting model outperforms previous state-of-the-art methods on two English speaker identification benchmarks by up to 9% in accuracy and 5% with only distant supervision, as well as two Chinese speaker identification datasets by up to 4.7%.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…