ZODS-RS -- Zero-training Oriented Detection & Segmentation for Remote Sensing
Abstract
Remote-sensing and UAV applications need models that generalize across platforms and viewpoints without task-specific training. Yet training-free pipelines often falter on oriented geometry, scale/rotation variation, and crowded ports or airfields, and rarely unify detection and segmentation. We introduce ZODS-RS, a training-free, closed-form pipeline that outputs horizontal boxes (HBB) and instance masks. Built on DINOv3 dense features and SAM-style proposals, ZODS-RS chains: PP (prototype purification via Tyler covariance), R-SEM (rotation-scale equivariant matching with separable kernels and global Hungarian assignment), and UAM (uncertainty-aware pixelwise merging with adaptive priors and optional negative prototypes). A lightweight CWLA fuses multiple DINOv3 layers. On FAIR1M (HBB) we obtain mAP0.50:0.95=13.06 and APS=2.93 (class-averaged over ship/airplane); on xView (HBB) we report mAP=16.69. On our UAV dataset, ZODS-RS achieves mask mIoU=31.10 and improves small-object AP by +30.70 over Grounded-SAM on a single 5090. This work offers a unified, no-training solution for horizontal-box detection plus instance segmentation in aerial imagery; provides explicit closed-form formulations for PP/R-SEM/UAM tightly coupled with DINOv3; and demonstrates consistent gains on small and crowded targets and under cross-domain shifts while keeping deployment simple.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.