GRASP: A Goodness-of-Fit Test for Classification Learning
Abstract
Performance of classifiers is often measured in terms of average accuracy on test data. Despite being a standard measure, average accuracy fails in characterizing the fit of the model to the underlying conditional law of labels given the features vector (Y|X), e.g. due to model misspecification, over fitting, and high-dimensionality. In this paper, we consider the fundamental problem of assessing the goodness-of-fit for a general binary classifier. Our framework does not make any parametric assumption on the conditional law Y|X, and treats that as a black box oracle model which can be accessed only through queries. We formulate the goodness-of-fit assessment problem as a tolerance hypothesis testing of the form \[ H0: E[Df( Bern(η(X))\| Bern(η(X)))]≤ τ\,, \] where Df represents an f-divergence function, and η(x), η(x) respectively denote the true and an estimate likelihood for a feature vector x admitting a positive label. We propose a novel test, called for testing H0, which works in finite sample settings, no matter the features (distribution-free). We also propose model-X designed for model-X settings where the joint distribution of the features vector is known. Model-X uses this distributional information to achieve better power. We evaluate the performance of our tests through extensive numerical experiments.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.