Towards Temporal Change Explanations from Bi-Temporal Satellite Images

Abstract

Explaining temporal changes between satellite images taken at different times is important for urban planning and environmental monitoring. However, manual dataset construction for the task is costly, so human-AI collaboration is promissing. Toward the direction, in this paper, we investigate the ability of Large-scale Vision-Language Models (LVLMs) to explain temporal changes between satellite images. While LVLMs are known to generate good image captions, they receive only a single image as input. To deal with a par of satellite images as input, we propose three prompting methods. Through human evaluation, we found the effectiveness of our step-by-step reasoning based prompting.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…