Demographic-aware fine-grained visual recognition of pediatric wrist pathologies

Abstract

Pediatric wrist pathologies recognition from radiographs is challenging because normal anatomy changes rapidly with development: evolving carpal ossification and open physes can resemble pathology, and maturation timing differs by sex. Image-only models trained on limited medical datasets therefore risk confusing normal developmental variation with true pathologies. We address this by framing pediatric wrist diagnosis as a fine-grained visual recognition (FGVR) problem and proposing a demographic-aware hybrid convolution--transformer model that fuses X-rays with patient age and sex. To leverage demographic context while avoiding shortcut reliance, we introduce progressive metadata masking during training. We evaluate on a curated dataset that mirrors the typical constraints in real-world medical studies. The hybrid FGVR backbone outperforms traditional and modern CNNs, and demographic fusion yields additional gains. Finally, we show that initializing from a fine-grained pretraining source improves transfer relative to standard ImageNet initialization, suggesting that label granularity, even from non-medical data, can be a key driver of generalization for subtle radiographic findings.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…