Consistency and Reproducibility of Grades in Higher Education: A Case Study in Deep Learning

Abstract

Evaluating the performance of students in higher education is essential for gauging the effectiveness of teaching methods and achieving greater equality of opportunities for all. In this study, we investigate the correlation between two teachers' grading practices in a deep learning course at the master's level, offered at CentraleSup\'elec. The two teachers, who have distinct teaching styles, were responsible for marking the final project oral presentation. Our results indicate a significant positive correlation (0.76) between the two teachers' grading practices, suggesting that their assessments of students' performance are consistent. Although consistent with each other, grades do not seem to be fully reproducible from one examiner to the other suggesting serious drawbacks of only using one examiner for oral projects. Furthermore, we observed that the maximum difference between the grades assigned by the two examiners was 12.5%, with a mean of 6.3\% (and median of 5.0\%), highlighting the potential impact of inter-examiner variability on students' final grades.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…