Computing Approximate Statistical Discrepancy

Abstract

Consider a geometric range space (X,A) where each data point x ∈ X has two or more values (say r(x) and b(x)). Also consider a function (A) defined on any subset A ∈ (X,A) on the sum of values in that range e.g., rA = Σx ∈ A r(x) and bA = Σx ∈ A b(x). The -maximum range is A* = A ∈ (X,A) (A). Our goal is to find some A such that |(A) - (A*)| ≤ . We develop algorithms for this problem for range spaces with bounded VC-dimension, as well as significant improvements for those defined by balls, halfspaces, and axis-aligned rectangles. This problem has many applications in many areas including discrepancy evaluation, classification, and spatial scan statistics.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…