An Environmental Feature Representation in I-vector Space for Room Verification and Metadata Estimation

Abstract

This paper investigates the application of environmental feature representations for room verification tasks and acoustic meta-data estimation. Audio recordings contain both speaker and non-speaker information. We refer to the non-speaker-related information, including channel and other environmental factors, as e-vectors. I-vectors, commonly used in speaker identification, are extracted in the total variability space and capture both speaker and channel-environment information without discrimination. Accordingly, e-vectors can be extracted from i-vectors using methods such as linear discriminant analysis. In this paper, we first demonstrate that e-vectors can be successfully applied to room verification tasks with a low equal error rate. Second, we propose two methods for estimating metadata information -- signal-to-noise (SNR) and reverberation (T60) -- from these e-vectors. When comparing our system to contemporary global SNR estimation methods, in terms of accuracy, we perform favorably even with low dimensional i-vectors. Lastly, we show that room verification tasks can be improved if e-vectors are augmented with the extracted metadata information.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…