SDGIC: A Semantic Disambiguation-Guided Generative Image Compression Method for Ultra-Low Bitrates

Fan Li

SDGIC: A Semantic Disambiguation-Guided Generative Image Compression Method for Ultra-Low Bitrates

Abstract

Generative image compression has recently shown impressive perceptual quality, but often suffers from semantic inconsistency at ultra-low bitrates (bpp < 0.05), limiting its reliable deployment in bandwidth-constrained scenarios such as 6G semantic communications. This inconsistency stems from incomplete guidance information, which introduces semantic ambiguity into the generation process and may lead to natural-looking but source-inconsistent content. In this work, we propose a Semantic-Disambiguation-Guided Generative Image Compression (SDGIC) framework to constrain diffusion-based reconstruction at ultra-low bitrates. Specifically, SDGIC compresses the source image into three compact and complementary guidance streams: a concise text caption for global semantics, a highly compressed image (HCI) for dense visual evidence, and Reconstruction-Aware Semantic Residual Tokens (RSRTs) for reconstruction-relevant residual semantics that remain ambiguous under the text caption and HCI conditions. The RSRTs are directly optimized toward the downstream denoising objective, enabling them to provide source-specific semantic constraints for disambiguating diffusion-based reconstruction. To inject these three guidance streams into the generation process effectively, we design a Dual-Path Conditioned Diffusion Decoder (DPCD), which uses cross-attention for semantic conditions and ControlNet residuals for dense visual guidance. Extensive experiments demonstrate that SDGIC improves semantic consistency at ultra-low bitrates while maintaining favorable perceptual quality, with a 23.4% reduction in AFINE on the CLIC2020 dataset.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…