Designing a Data Science simulation with MERITS: A Primer

Abstract

Simulations play a crucial role in the modern scientific process. Yet despite (or due to) this ubiquity, the Data Science community shares neither a comprehensive definition for a "high-quality" study nor a consolidated guide to designing one. Inspired by the Predictability-Computability-Stability (PCS) framework for 'veridical' Data Science, we propose six MERITS that a simulation study should satisfy. (Modularity and Efficiency support the computability of a study, encouraging clean and flexible implementation. Realism and Stability address the conceptualization of the research problem: How well does a study predict reality, such that its conclusions generalize to new data/contexts? Finally, Intuitiveness and Transparency encourage good communication and trustworthiness of study design and results.) Drawing an analogy between simulation and cooking, we moreover offer (a) a conceptual framework for thinking about the anatomy of a simulation 'recipe'; (b) a baker's dozen in guidelines to aid the Data Science practitioner in designing one; and (c) a case study demonstrating the practical utility of our framework by using it to autopsy a preexisting simulation study. With this "PCS primer" for high-quality Data Science simulation, we seek to distill and enrich the best practices of simulation across disciplines into a cohesive recipe for trustworthy, veridical Data Science.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…