Using Ramsey theory to measure unavoidable spurious correlations in Big Data

Abstract

Given a dataset we quantify how many patterns must always exist in the dataset. Formally this is done through the lens of Ramsey theory of graphs, and a quantitative bound known as Goodman's theorem. Combining statistical tools with Ramsey theory of graphs gives a nuanced understanding of how far away a dataset is from random, and what qualifies as a meaningful pattern. This method is applied to a dataset of repeated voters in the 1984 US congress, to quantify how homogeneous a subset of congressional voters is. We also measure how transitive a subset of voters is. Statistical Ramsey theory is also used with global economic trading data to provide evidence that global markets are quite transitive.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…