PadAug: Robust Speaker Verification with Simple Waveform-Level Silence Padding

Abstract

The presence of non-speech segments in utterances often leads to the performance degradation of speaker verification. Existing systems usually use voice activation detection as a preprocessing step to cut off long silence segments. However, short silence segments, particularly those between speech segments, still remain a problem for speaker verification. To address this issue, in this paper, we propose a simple wave-level data augmentation method, PadAug, which aims to enhance the system's robustness to silence segments. The core idea of PadAug is to concatenate silence segments with speech segments at the waveform level for model training. Due to its simplicity, it can be directly applied to the current state-of-the art architectures. Experimental results demonstrate the effectiveness of the proposed PadAug. For example, applying PadAug to ResNet34 achieves a relative equal error rate reduction of 5.0\% on the voxceleb dataset. Moreover, the PadAug based systems are robust to different lengths and proportions of silence segments in the test data.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…