LEMURS dataset: Large-scale multi-detector ElectroMagnetic Universal Representation of Showers

Abstract

We present LEMURS: an extensive dataset of simulated calorimeter showers designed to support the development and benchmarking of fast simulation methods in high-energy physics, most notably providing a step towards the development of foundation models. This new dataset is more robust than the well-established CaloChallenge dataset 2, featuring substantially greater statistics, a wider range of incident angles in the detector, and most crucially multiple detector geometries (including more realistic calorimeters). The dataset is provided in HDF5 format, with a file structure inspired by the CaloChallenge shower representation while also including more variables. LEMURS scale and diversity make it particularly suitable for development of foundation models and has been used in the CaloDiT-2 model, a pre-trained model released in the community standard simulation toolkit Geant4 (version 11.4.beta). All data and code for generation and analysis are openly accessible, facilitating reproducibility and reuse across the community.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…