Feature Selection and Junta Testing are Statistically Equivalent
Abstract
For a function f \0,1\n \0,1\, the junta testing problem asks whether f depends on only k variables. If f depends on only k variables, the feature selection problem asks to find those variables. We prove that these two tasks are statistically equivalent. Specifically, we show that the ``brute-force'' algorithm, which checks for any set of k variables consistent with the sample, is simultaneously sample-optimal for both problems, and the optimal sample size is \[ ( 1 ( 2k n k + n k)). \]
0
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.