Tabular GANs for uneven distribution

Abstract

Generative models for tabular data have evolved rapidly beyond Generative Adversarial Networks (GANs). While GANs pioneered synthetic tabular data generation, recent advances in diffusion models and large language models (LLMs) have opened new paradigms with complementary strengths in sample quality, privacy, and controllability. In this paper, we survey the landscape of tabular data generation across three major paradigms - GANs, diffusion models, and LLMs - and introduce a unified, modular framework that supports all three. The framework encompasses data preprocessing, a model-agnostic interface layer, standardized training and inference pipelines, and a comprehensive evaluation module. We validate the framework through experiments on seven benchmark datasets, demonstrating that GAN-based augmentation can improve downstream performance under distribution shift. The framework and its reference implementation are publicly available at https://github.com/Diyago/Tabular-data-generation, facilitating reproducibility and extensibility for future research.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…