Early Cue Precision Shapes Visual Shortcut Learning in Controlled Cue-Manipulation Benchmarks
Abstract
Visual classifiers can achieve high matched-distribution accuracy while relying on low-level cues that fail under conflict or suppression. We test whether this failure is shaped by early cue precision: the reliability with which a low-level cue predicts the label during early learning or downstream probe fitting. Across synthetic shape-texture tasks, sequential digit training, a 10-class frozen-representation audit, and a CIFAR-10 natural-image-based texture-overlay benchmark, we manipulate object-texture match probability and evaluate matched-ID accuracy, conflict accuracy, texture-choice rate, and suppression behavior. Degraded-but-predictive input does not substitute for cue decorrelation. In 10-class digit probes, conflict accuracy drops from 0.589 under chance-like cue precision to 0.005 under target-perfect texture. In CIFAR-10 frozen probes, conflict accuracy drops from 0.569 to 0.114, while texture choice rises from 0.049 to 0.855; this ordering persists across texture-overlay strengths alpha in 0.15,0.25,0.35,0.50. End-to-end CIFAR-10 training shows that low early cue precision improves pre-target conflict behavior, but shortcut-rich fine-tuning can rapidly overwrite this benefit. Cue decorrelation must therefore be maintained during downstream adaptation rather than treated as a one-time inoculation.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.