Exploiting Vision Encoder Vulnerabilities for Universal Adversarial Perturbations on Large Vision-Language Models

Abstract

Large Vision-Language Models (LVLMs) have achieved remarkable performance on multimodal tasks but remain highly vulnerable to small adversarial perturbations in input images. Existing attacks typically target the vision encoder's final output embeddings, implicitly treating the encoder as a uniform attack surface, while a systematic analysis of which internal components are most vulnerable has remained largely unexplored. We show such analysis is essential, as adversarial vulnerability in LVLM vision encoders is structurally concentrated rather than uniformly distributed. Building on this, we propose Vision Encoder Vulnerable-Component-Targeted Universal Adversarial Perturbation (VEV-UAP), a task-agnostic and cost-efficient attack framework. Through a component- and layer-wise analysis of attention mechanisms, we identify the value components in middle layers as critical vulnerabilities that strongly influence downstream language model behavior. VEV-UAP selectively targets these components to generate a single universal perturbation shared across images, without involving textual inputs or the language model during optimization. Experiments across multiple LVLMs and tasks show VEV-UAP achieves state-of-the-art attack success rates with reduced computational overhead. Moreover, a single VEV-UAP transfers across LVLMs sharing the same vision encoder, even when paired with different language models, making it a practical framework for scalable robustness evaluation.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…