GAUSS: Benchmarking Structured Mathematical Skills for Large Language Models

Abstract

We introduce GAUSS (General Assessment of Underlying Structured Skills in Mathematics), a benchmark that evaluates LLMs' mathematical abilities across twelve core skill dimensions, grouped into three domains: knowledge and understanding, problem solving and communication, and meta-skills and creativity. By categorizing problems according to cognitive skills and designing tasks that isolate specific abilities, GAUSS constructs comprehensive, fine-grained, and interpretable profiles of models' mathematical abilities. These profiles faithfully represent their underlying mathematical intelligence. To exemplify how to use the GAUSS benchmark, we have derived the skill profile of GPT-5-thinking, revealing its strengths and weaknesses as well as its differences relative to o4-mini-high, thereby underscoring the value of multidimensional, skill-based evaluation.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…