Implementing Automated Data Validation for Canadian Political Datasets

Abstract

This paper describes a series of automated data validation tests for datasets detailing charity financial information, political donations, and government lobbying in Canada. We motivate and document a series of 200 tests that check the validity, internal consistency, and external consistency of these datasets. We present preliminary findings after application of these tests to the political donations (≈10.1 million observations) and lobbying (≈711,200 observations) datasets, and to a sample of ≈380,880 observations from the charities datasets. We conclude with areas for future work and lessons learnt for others looking to implement automated data validation in their own workflows.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…