Skin-R1: Clinical Knowledge-Guided Dermatological Diagnosis Using Vision-Language Models

Vasant G Honavar

Skin-R1: Clinical Knowledge-Guided Dermatological Diagnosis Using Vision-Language Models

Abstract

Vision--language models (VLMs) have recently shown promise for assisting clinical reasoning in dermatological diagnosis. However, their trustworthiness and clinical utility remain limited by three key challenges: heterogeneous datasets with inconsistent diagnostic labels and concept annotations, the lack of grounded diagnostic rationales for reliable reasoning supervision, and limited scalability when transferring knowledge from small, densely annotated datasets to large collections with sparse labels. To address these challenges, we propose Skin-R1, a dermatology-oriented VLM that integrates textbook-grounded clinical reasoning supervision with reinforcement learning (RL) to improve the accuracy and robustness of diagnostic prediction. First, we construct a textbook-based reasoning generator that synthesizes hierarchy-aware and differential-diagnosis (DDx) diagnostic trajectories derived from authoritative dermatology knowledge. Second, these trajectories are used for supervised fine-tuning (SFT), establishing a clinically grounded reasoning foundation for the model. Finally, we introduce an RL training framework that incorporates the hierarchical structure of dermatological diseases into the reward design, enabling the model to generalize grounded diagnostic reasoning to large-scale datasets with sparse annotations. Extensive experiments across multiple dermatology benchmarks demonstrate that Skin-R1 consistently improves diagnostic accuracy and robustness compared to state-of-the-art Med-VLM baselines. Ablation studies further highlight the critical role of grounded reasoning supervision introduced during the SFT stage.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…