P-values for classification

Abstract

Let (X,Y) be a random variable consisting of an observed feature vector X∈ X and an unobserved class label Y∈ \1,2,...,L\ with unknown joint distribution. In addition, let D be a training data set consisting of n completely observed independent copies of (X,Y). Usual classification procedures provide point predictors (classifiers) Y(X,D) of Y or estimate the conditional distribution of Y given X. In order to quantify the certainty of classifying X we propose to construct for each θ =1,2,...,L a p-value πθ(X,D) for the null hypothesis that Y=θ, treating Y temporarily as a fixed parameter. In other words, the point predictor Y(X,D) is replaced with a prediction region for Y with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.

0

Discussion (0)

Sign in to join the discussion.

Loading comments…