Semantically Guided Dynamic Visual Prototype Refinement for Compositional Zero-Shot Learning

Hongwei Liu

doi:10.1016/j.neucom.2026.132775

Semantically Guided Dynamic Visual Prototype Refinement for Compositional Zero-Shot Learning

Abstract

Compositional Zero-Shot Learning (CZSL) seeks to recognize unseen state-object pairs by recombining primitives learned from seen compositions. Despite recent progress with vision-language models (VLMs), two limitations remain: (i) text-driven semantic prototypes are weakly discriminative in the visual feature space; and (ii) unseen pairs are optimized passively, thereby inducing seen bias. To address these limitations, we present Duplex, a framework that couples dual-prototype learning with dynamic local-graph refinement of visual prototypes. For each composition, Duplex maintains a semantic prototype via prompt learning and a visual prototype for unseen pairs constructed by recombining disentangled state and object primitives from seen images. The visual prototypes are updated dynamically through lightweight aggregation on mini-batch local graphs, which incorporates unseen compositions during training without labels. This design introduces fine-grained visual evidence while preserving semantic structure. It enriches class prototypes, better disambiguates semantically similar yet visually distinct pairs, and mitigates seen bias. Experiments on MIT-States, UT-Zappos, and CGQA in closed-world and open-world settings achieve competitive performance and consistent compositional generalization. Our source code is available at https://github.com/ISPZ/Duplex-CZSL.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…