LUMEN: Low-light Unified Multi-stage Enhancement Network using depth-guided flash, clustering, and attention-based Transformers
Abstract
Low-light image enhancement remains a challenging problem due to severe noise, color distortion, contrast degradation, and loss of structural details under insufficient illumination. Existing methods typically apply uniform enhancement without considering the depth-dependent nature of light attenuation and sensor noise in real-world scenes. To address this limitation, we propose LUMEN, a multi-stage enhancement framework that integrates virtual flash simulation with transformer-based feature fusion. The proposed framework first estimates scene depth from low-light inputs using a dedicated encoder-decoder network, after which a soft clustering module partitions pixels into depth-aware regions, enabling depth-dependent flash simulation. The simulated flash features, together with depth representations, are fused with image features through efficient attention-based fusion blocks to enhance global context while preserving fine details. A composite loss function combining reconstruction, perceptual, structural, color, edge, and depth consistency objectives ensures both visual fidelity and perceptual quality. Extensive experiments on LOL-v1 and LOL-v2 benchmarks demonstrate that LUMEN achieves state-of-the-art performance and produces visually natural results compared with several state-of-the-art methods.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.