Towards "all-inclusive" Data Preparation to ensure Data Quality

Abstract

Data preparation, especially data cleaning, is very important to ensure data quality and to improve the output of automated decision systems. Since there is no single tool that covers all steps required, a combination of tools -- namely a data preparation pipeline -- is required. Such process comes with a number of challenges. We outline the challenges and describe the different tasks we want to analyze in our future research to address these. A test data generator which we implemented to constitute the basis for our future work will also be introduced in detail.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…