How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

Byron C. Wallace

How Many and Which Training Points Would Need to be Removed to Flip this Prediction?

Abstract

We consider the problem of identifying a minimal subset of training data St such that if the instances comprising St had been removed prior to training, the categorization of a given test point xt would have been different. Identifying such a set may be of interest for a few reasons. First, the cardinality of St provides a measure of robustness (if |St| is small for xt, we might be less confident in the corresponding prediction), which we show is correlated with but complementary to predicted probabilities. Second, interrogation of St may provide a novel mechanism for contesting a particular model prediction: If one can make the case that the points in St are wrongly labeled or irrelevant, this may argue for overturning the associated prediction. Identifying St via brute-force is intractable. We propose comparatively fast approximation methods to find St based on influence functions, and find that -- for simple convex text classification models -- these approaches can often successfully identify relatively small sets of training examples which, if removed, would flip the prediction.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…