Regularizing random points by deleting a few
Abstract
It is well understood that if one is given a set X ⊂ [0,1] of n independent uniformly distributed random variables, then 0 ≤ x ≤ 1 | \# X [0,x]\# X - x | n n with very high probability. We show that one can improve the error term by removing a few of the points. For any m ≤ 0.001n there exists a subset Y ⊂ X obtained by deleting at most m points, so that the error term drops from n/n to (n)/m with high probability. When m=cn for a small 0 ≤ c ≤ 0.001, this achieves the essentially optimal asymptotic order of discrepancy (n)/n. The proof is constructive and works in an online setting (where one is given the points sequentially, one at a time, and has to decide whether to keep or discard it). A change of variables shows the same result for any random variables on the real line with absolutely continuous density.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.