rta-dq-lib: a software library to perform online data quality analysis of scientific data
Abstract
The Cherenkov Telescope Array (CTA) is an initiative that is currently building the largest gamma-ray ground Observatory that ever existed. A Science Alert Generation (SAG) system, part of the Array Control and Data Acquisition (ACADA) system of the CTA Observatory, analyses online the telescope data - arriving at an event rate of tens of kHz - to detect transient gamma-ray events. The SAG system also performs an online data quality analysis to assess the instruments' health during the data acquisition: this analysis is crucial to confirm good detections. A Python and a C++ software library to perform the online data quality analysis of CTA data, called rta-dq-lib, has been proposed for CTA. The Python version is dedicated to the rapid prototyping of data quality use cases. The C++ version is optimized for maximum performance. The library allows the user to define, through XML configuration files, the format of the input data and, for each data field, which quality checks must be performed and which types of aggregations and transformations must be applied. It internally translates the XML configuration into a direct acyclic computational graph that encodes the dependencies of the computational tasks to be performed. This model allows the library to easily take advantage of parallelization at the thread level and the overall flexibility allow us to develop generic data quality analysis pipelines that could also be reused in other applications.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.