When Style Similarity Scores Fail: Diagnosing Raw CSD Cosine in Artist-Style Evaluation

Abstract

Raw cosine in the 768-dimensional output space of the Contrastive Style Descriptor (CSD) is now widely read as an absolute, calibrated style-fidelity score for text-to-image and style-imitation evaluation. We introduce the discrimination gap, a corpus-internal, prototype-free and threshold-free diagnostic that tests whether contrastive style cosines admit an absolute same-versus-different interpretation on a candidate artist corpus. On a 1799-artwork, 91-artist public-domain corpus, raw CSD cosine yields negative point-estimate gaps for 23/91 artists at the pairwise level (2/91 robust under bootstrap) and for 15/91 in the aggregated-pool scoring regime style-fidelity evaluations typically use. CSLS readout on the frozen backbone reduces the aggregated negative-gap count to 4/91; combined with positional-embedding interpolation to 336 pixels it raises unsupervised pair-verification AUC from 0.883 to 0.905 across 25 artist-disjoint splits. We refer to this diagnostic-driven readout protocol on the frozen backbone (CSLS as default, pos-interp 336 as the stronger optional setting) as CSD+, not a new encoder.A cross-backbone check on CLIP-ViT-L/14, SigLIP-large and DINOv2-Large reproduces the same shared-tradition failure pattern, providing evidence that the residual reflects a shared limitation of the four backbones we tested rather than a CSD-specific artefact. Practical implication: before reporting CSD cosine as an absolute style-fidelity score, run the diagnostic on the candidate corpus; CSLS is the minimal correction when it fails.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…