voxmap-studio: An open-source speaker diarization annotation tool with built-in cost instrumentation

Abstract

Labeling speaker diarization data is costly, yet annotation tools rarely measure that cost. We present voxmap-studio, an open-source, React-based diarization annotation tool integrated with the pyannote-based diarization ecosystem. Its canvas is initialized by a fast stride-accelerated diarization engine so that the annotator corrects a hypothesis rather than drawing every speaker turn by hand, and the tool records annotation cost - typed edit-operation counts and time - as a first-class output, enabling quantitative comparison of how much different forms of assistance actually help. Export is gated on per-segment human confirmation and guarded by injected "phantom" attention checks, which prevent unverified automatic output from being released as ground truth. In a preliminary study on nine AMI audio files, unassisted manual annotation was the costliest and least accurate, and automatic initialization shifted the work from creating turns to correcting them; highlighting uncertain segments gave the lowest cost in our small sample. The tool and its instrumentation are open source.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…