Second-Order Asymptotically Optimal Statistical Classification
Abstract
Motivated by real-world machine learning applications, we analyze approximations to the non-asymptotic fundamental limits of statistical classification. In the binary version of this problem, given two training sequences generated according to two unknown distributions P1 and P2, one is tasked to classify a test sequence which is known to be generated according to either P1 or P2. This problem can be thought of as an analogue of the binary hypothesis testing problem but in the present setting, the generating distributions are unknown. Due to finite sample considerations, we consider the second-order asymptotics (or dispersion-type) tradeoff between type-I and type-II error probabilities for tests which ensure that (i) the type-I error probability for all pairs of distributions decays exponentially fast and (ii) the type-II error probability for a particular pair of distributions is non-vanishing. We generalize our results to classification of multiple hypotheses with the rejection option.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.