An audio-to-analysis pipeline with certified transcription for information-theoretic profiling of the piano repertoire
Abstract
We present an audio-to-analysis pipeline that produces composer-level information-theoretic profiles : reflecting compositional vocabulary as it emerges from aggregated performances : from raw recordings, built on a transcription layer whose accuracy we certify on a standard benchmark (F1 = 0.9791 on the MAESTRO v3.0.0 test set). Applied to 1,238 pieces and 15 MAESTRO composers with at least ten attributed pieces, spanning the Baroque through the early twentieth century, the pipeline derives empirical distributions over harmonic scale degrees and analyzes them through Shannon entropy, asymmetric Kullback-Leibler divergence, and Zipfian rank-frequency modeling. The resulting profiles (i) order composers along an interpretable axis of harmonic predictability, with a narrow entropy range (3.33-3.86 bits) that reveals the marginal-level similarity of tonal vocabularies; (ii) recover known stylistic lineages (Haydn-Beethoven, Liszt-Rachmaninoff, Schubert-Schumann) through the smallest KL divergences in the corpus, with Mendelssohn emerging as a stable outlier within this corpus; and (iii) separate contemporary neoclassical artists (Richter, Frahm, Glass, Arnalds, J\'ohannsson) from historical composers on the quality of Zipfian fit to the transition distribution, with mean R2 = 0.78 for neoclassical versus 0.46 for historical (N ≥ 10 pieces each). This gap is larger than the spread within either group and is consistent with a minimalist compositional tendency: a compact transition vocabulary used with sharper frequency-rank regularity than historical composers. All estimates are reported with Laplace-smoothed bootstrap 95% confidence intervals.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.