Perceptual implications of automatic anonymization in pathological speech
Abstract
Automatic anonymization is increasingly used to enable ethical sharing of clinical speech, yet its perceptual and clinical consequences remain undercharacterized. We present a human-centered evaluation of automatically anonymized pathological speech, using a structured protocol with ten native and non-native German listeners spanning clinical and signal-processing expertise. The cohort comprised 180 German speakers from CLP, Dysarthria, Dysglossia, Dysphonia, and adult and child controls. Each original recording and its automatically-anonymized counterpart was evaluated on four tasks: zero-shot Turing-style discrimination, few-shot discrimination after brief familiarization, 5-point quality rating, and 4-point blinded clinical severity rating by a senior phoniatrician. Listeners detected anonymization at 91% zero-shot and 93% few-shot accuracy, with significant variation across disorders (p=0.008) that attenuated with familiarization. Perceived quality dropped by 30 ppts on a 0-100 scale (p<0.001), reorganizing the perceived-quality hierarchy across groups. Native language modulated detectability but not quality degradation, while domain expertise modulated quality degradation but not detectability, a double dissociation between the two listener attributes; speaker sex and age produced no detectable bias. Clinical severity ratings were preserved at near-perfect agreement in Dysarthria, Dysglossia, and Dysphonia (quadratic-weighted Cohen's kappa 0.87-0.94), with no recording shifting by more than one grade. Crucially, perceptual outcomes were decoupled from the standard computational privacy metric: the pathology with the strongest computational anonymization was the least perceptually conspicuous, and vice versa. These findings argue for disorder-stratified, listener-stratified, clinician-validated evaluation as the minimum standard for licensing anonymized speech for clinical use.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.