An Acoustic Landmark Database of the English Lexicon via Articulatory Synthesis

Abstract

Acoustic landmark theory treats speech as organized around the acoustic consequences of articulatory gestures that shape the vocal tract and airflow. Progress is limited by the scarcity of large, unambiguously annotated landmark datasets. We invert the problem by generating speech from landmark patterns. Using the Pink Trombone physical vocal-tract synthesizer, we produce an English lexicon for two adult configurations (male, female). With direct control of gestures, we place landmark labels algorithmically at the exact times of their physical events (e.g., oral closures/releases). The corpus contains >200,000 synthesized words, rendered for both configurations with time-aligned annotations; intelligibility is measured with STOI. We leverage it for statistics across the lexicon from an articulatory-event view, reporting landmark frequencies and dominant cue patterns, and enabling quantitative studies plus training/benchmarking of automatic landmark detectors.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…