Detecting AI-Generated Images via Distributional Deviations from Real Images

Abstract

The rapid advancement of generative models has significantly enhanced the quality of AI-generated images, raising concerns about misinformation and the erosion of public trust. Detecting AI-generated images has thus become a critical challenge, particularly in terms of generalizing to unseen generative models. Existing methods using frozen pre-trained CLIP models show promise in generalization but treat the image encoder as a basic feature extractor, failing to fully exploit its potential. In this paper, we perform an in-depth analysis of the frozen CLIP image encoder (CLIP-ViT), revealing that it effectively clusters real images in a high-level, abstract feature space. However, it does not truly possess the ability to distinguish between real and AI-generated images. Based on this analysis, we propose a Masking-based Pre-trained model Fine-Tuning (MPFT) strategy, which introduces a Texture-Aware Masking (TAM) mechanism to mask textured areas containing generative model-specific patterns during fine-tuning. This approach compels CLIP-ViT to attend to the "distributional deviations"from authentic images for AI-generated image detection, thereby achieving enhanced generalization performance. Extensive experiments on the GenImage and UniversalFakeDetect datasets demonstrate that our method, fine-tuned with only a minimal number of images, significantly outperforms existing approaches, achieving up to 98.2% and 94.6% average accuracy on the two datasets, respectively.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…