B\"{u}y\"{u}k Dil Modelleri i\c{c}in TR-MMLU Benchmark{\i}: Performans De\u{g}erlendirmesi, Zorluklar ve \.{I}yile\c{s}tirme F{\i}rsatlar{\i}

Öner Aytaş

doi:10.1109/SIU66497.2025.11112154

B\"uy\"uk Dil Modelleri icin TR-MMLU Benchmark: Performans Degerlendirmesi, Zorluklar ve \.Iyilestirme Frsatlar

Abstract

Language models have made significant advancements in understanding and generating human language, achieving remarkable success in various applications. However, evaluating these models remains a challenge, particularly for resource-limited languages like Turkish. To address this issue, we introduce the Turkish MMLU (TR-MMLU) benchmark, a comprehensive evaluation framework designed to assess the linguistic and conceptual capabilities of large language models (LLMs) in Turkish. TR-MMLU is based on a meticulously curated dataset comprising 6,200 multiple-choice questions across 62 sections within the Turkish education system. This benchmark provides a standard framework for Turkish NLP research, enabling detailed analyses of LLMs' capabilities in processing Turkish text. In this study, we evaluated state-of-the-art LLMs on TR-MMLU, highlighting areas for improvement in model design. TR-MMLU sets a new standard for advancing Turkish NLP research and inspiring future innovations.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…