Navigating the Reality Gap: On-Device Continual Adaptation of ASR for Clinical Telephony

Abstract

Automatic Speech Recognition (ASR) can significantly reduce documentation burden in clinical workflows, but standard models degrade sharply in real-world telephony settings where noisy audio, dialectal variation, and strict data residency constraints prevent cloud-based adaptation. We study this &#34;reality gap&#34; using Gram Vaani: a telephonic Hindi corpus spanning rural healthcare and agricultural helplines, as the closest available proxy for clinical speech under strict on-device constraints. We show that a robust multilingual model (IndicWav2Vec) degrades from 11.59\% WER on standard clean Hindi to 41.71\% WER on this proxy telephony data. We evaluate a progression of on-device adaptation regimes under realistic constraints, from full fine-tuning to parameter-efficient LoRA and stream-based continual learning, across multiple baselines, datasets, and seeds. Focusing on continual learning, our central finding highlights a critical interaction between Experience Replay (ER) and Elastic Weight Consolidation (EWC, parameterized by regularization strength λ). We show that standard positive EWC (λ> 0) can oppose replay-driven updates, limiting adaptation. Reversing EWC's strength (λ< 0) suggests that it can act as a directional control signal under ER-guided adaptation: negative λ reinforces replay-driven plasticity, while a scheduled λ enables phase-dependent control of stability and plasticity. Across evaluations on multiple datasets, we find that multi-domain replay provides a strong foundation for adaptation, while EWC modulates stability-plasticity dynamics without altering final performance. These results show that effective on-device adaptation depends on understanding how data-driven and parameter-level learning signals interact, rather than choosing methods in isolation.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…