A modern framework for jet tagger development
Abstract
This paper presents a new tool to perform various steps in jet tagger development in an efficient and comprehensive way. A common data structure is used for training, as well as for performance evaluation in data. The introduction of this new framework reduces the amount of data to be stored while accomplishing the same tasks, and shortens waiting times between algorithm development and data-to-simulation results becoming available from months to days, taking typical CMS experiment pipelines as a reference. Proper utilization of high-throughput systems enables first data-to-simulation studies with a recent neural network architecture, Particle Transformer, adapted to jet flavour tagging. Unlike official implementations of the collaboration, the new framework allows investigating different variants, like different training paradigms, and their impact on data/simulation agreement, without producing any new large files on disk, and within the same run of the analysis framework. Besides being more time- and storage-efficient and thus enabling the first results of that kind to be available just few hours after finishing neural network training, the framework is currently the only realization capable of studying how adversarial techniques affect data/simulation agreement for tagger algorithm outputs as well as inputs.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.