scatteR: Generating instance space based on scagnostics

Abstract

Traditional synthetic data generation methods rely on model-based approaches that tune the parameters of a model rather than focusing on the structure of the data itself. In contrast, Scagnostics is an exploratory graphical method that captures the structure of bivariate data using graph-theoretic measures. This paper presents a novel data generation method, scatteR, that uses Scagnostics measurements to control the characteristics of the generated dataset. By using an iterative Generalized Simulated Annealing optimizer, scatteR finds the optimal arrangement of data points that minimizes the distance between current and target Scagnostics measurements. The results demonstrate that scatteR can generate 50 data points in under 30 seconds with an average Root Mean Squared Error of 0.05, making it a useful pedagogical tool for teaching statistical methods. Overall, scatteR provides an entry point for generating datasets based on the characteristics of instance space, rather than relying on model-based simulations.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…