DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection

Abstract

Small object detection in complex scenes exposes a fundamental tension in neural network design: backbone attention distributes computation uniformly regardless of content, pyramid necks inflate activation magnitudes during upsampling without norm compensation, and bottleneck convolutions progressively smooth high-frequency edge components through accumulated spatial filtering. In response, we develop DFIR-DETR by tracing each proposed module back to a specific, measurable deficiency in the RT-DETR baseline: uniform attention that ignores spatial complexity, norm drift that destabilises upsampled features, and spatial convolutions that progressively suppress the high-frequency components small objects depend on. On NEU-DET and VisDrone, DFIR-DETR achieves 92.9% and 51.6% mAP50 with only 11.7M parameters and 47.2 GFLOPs, demonstrating consistent gains across two qualitatively different detection domains.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…