Unsupervised Deep Multi-focus Image Fusion
Abstract
Convolutional neural networks have recently been used for multi-focus image fusion. However, due to the lack of labeled data for supervised training of such networks, existing methods have resorted to adding Gaussian blur in focused images to simulate defocus and generate synthetic training data with ground-truth for supervised learning. Moreover, they classify pixels as focused or defocused and leverage the results to construct the fusion weight maps which then necessitates a series of post-processing steps. In this paper, we present unsupervised end-to-end learning for directly predicting the fully focused output image from multi-focus input image pairs. The proposed approach uses a novel CNN architecture trained to perform fusion without the need for ground truth fused images and exploits the image structural similarity (SSIM) to calculate the loss; a metric that is widely accepted for fused image quality evaluation. Consequently, we are able to utilize real benchmark datasets, instead of simulated ones, to train our network. The model is a feed-forward, fully convolutional neural network that can process images of variable sizes during test time. Extensive evaluations on benchmark datasets show that our method outperforms existing state-of-the-art in terms of visual quality and objective evaluations.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.