Auditory Attention Decoding with Task-Related Multi-View Contrastive Learning
Abstract
The human brain can easily focus on one speaker and suppress others in scenarios such as a cocktail party. Recently, researchers found that auditory attention can be decoded from the electroencephalogram (EEG) data. However, most existing deep learning methods are difficult to use prior knowledge of different views (that is attended speech and EEG are task-related views) and extract an unsatisfactory representation. Inspired by Broadbent's filter model, we decode auditory attention in a multi-view paradigm and extract the most relevant and important information utilizing the missing view. Specifically, we propose an auditory attention decoding (AAD) method based on multi-view VAE with task-related multi-view contrastive (TMC) learning. Employing TMC learning in multi-view VAE can utilize the missing view to accumulate prior knowledge of different views into the fusion of representation, and extract the approximate task-related representation. We examine our method on two popular AAD datasets, and demonstrate the superiority of our method by comparing it to the state-of-the-art method.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.