FujiView: Multimodal Late-Fusion for Predicting Scenic Visibility

Abstract

Visibility of natural landmarks such as Mount Fuji is a defining factor in both tourism planning and visitor experience, yet it remains difficult to predict due to rapidly changing atmospheric conditions. We present FujiView, a multimodal learning framework and dataset for predicting scenic visibility by fusing webcam imagery with structured meteorological data. Our late-fusion approach combines image-derived class probabilities with numerical weather features to classify visibility into five categories. The dataset currently comprises over 100,000 webcam images paired with concurrent and forecasted weather conditions from more than 40 cameras around Mount Fuji, and continues to expand; it will be released to support further research in environmental forecasting. Experiments show that YOLO-based vision features dominate short-term horizons such as "nowcasting" and "samedaycasting", while weather-driven forecasts increasingly take over as the primary predictive signal beyond +1d. Late fusion consistently yields the highest overall accuracy, achieving ACC of approx 0.89 for same-day prediction and up to 84% for next-day forecasts. These results position Scenic Visibility Forecasting (SVF) as a new benchmark task for multimodal learning.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…