Visualizing and Understanding Self-attention based Music Tagging
Abstract
Recently, we proposed a self-attention based music tagging model. Different from most of the conventional deep architectures in music information retrieval, which use stacked 3x3 filters by treating music spectrograms as images, the proposed self-attention based model attempted to regard music as a temporal sequence of individual audio events. Not only the performance, but it could also facilitate better interpretability. In this paper, we mainly focus on visualizing and understanding the proposed self-attention based music tagging model.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.