Presenting a classifier to detect research contributions in OpenAlex
Abstract
This paper introduces a document type classifier with the purpose to optimise the distinction between research and non-research journal publications in OpenAlex. Based on open metadata, the classifier can detect non-research or editorial content within a set of classified articles and reviews (e.g. paratexts, abstracts, editorials, letters). The classifier achieves an F1-score of 0,95, indicating a potential improvement in the data quality of bibliometric research in OpenAlex when applying the classifier on real data. In total, 4.589.967 out of 42.701.863 articles and reviews could be reclassified as non-research contributions by the classifier, representing a share of 10,75%
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.