Physics-Informed Neural Operator for Speech Production Analysis

Abstract

Physics-informed neural operators (PINOs) have recently gained attention as fast numerical simulators with potential for solving inverse problems. This study proposes the first PINO-based method for speech production analysis. The model learns the governing one-dimensional wave equations directly without requiring pre-computed supervised training data. Using vocal tract shape data as input features, we compare the proposed model's predicted f0, glottal volume velocity and sound pressure at the lip for five static vowels to a conventional Runge Kutta/Finite difference approach. With errors of 0.8% for glottal volume flow and 3.2% for speech waveforms, the proposed model enables efficient GPU-parallelized simulation without iterative calculations. We conclude that PINO is a promising approach for fast analysis of speech.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…