The IRMA Dataset: A Structured Audio-MIDI Corpus for Iranian Classical Music

Abstract

We present the IRMA Dataset (Iranian Radif MIDI Audio), a multi-level, open-access corpus designed for the computational study of Iranian classical music, with a particular emphasis on the radif, a structured repertoire of modal-melodic units central to pedagogy and performance. The dataset combines symbolic MIDI representations, phrase-level audio-MIDI alignment, musicological transcriptions in PDF format, and comparative tables of theoretical information curated from a range of performers and scholars. We outline the multi-phase construction process, including segment annotation, alignment methods, and a structured system of identifier codes to reference individual musical units. The current release includes the complete radif of Karimi; MIDI files and metadata from Mirza Abdollah's radif; selected segments from the vocal radif of Davami, as transcribed by Payvar and Fereyduni; and a dedicated section featuring audio-MIDI examples of tahrir ornamentation performed by prominent 20th-century vocalists. While the symbolic and analytical components are released under an open-access license (CC BY-NC 4.0), some referenced audio recordings and third-party transcriptions are cited using discographic information to enable users to locate the original materials independently, pending copyright permission. Serving both as a scholarly archive and a resource for computational analysis, this dataset supports applications in ethnomusicology, pedagogy, symbolic audio research, cultural heritage preservation, and AI-driven tasks such as automatic transcription and music generation. We welcome collaboration and feedback to support its ongoing refinement and broader integration into musicological and machine learning workflows.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…