vSTMD: Visual Motion Detection for Extremely Tiny Target at Various Velocities
Abstract
Visual motion detection for extremely tiny (ET-) targets is challenging, due to their category-independent nature and the scarcity of visual cues, which often incapacitate mainstream feature-based models. Natural architectures with rich interpretability offer a promising alternative, where STMD architectures derived from insect visual STMD (Small Target Motion Detector) pathways have demonstrated their effectiveness. However, previous STMD models are constrained to a narrow velocity range, hindering their efficacy in real-world scenarios where targets exhibit diverse and unstable dynamics. To address this limitation, we present vSTMD, a learning-free model for motion detection of ET-targets at various velocities. Our key innovations include: (1) a cross-Inhibition Dynamic Potential (cIDP) that serves as a self-adaptive mechanism efficiently capturing motion cues across a wide velocity spectrum, and (2) the first Collaborative Directional Gradient Calculation (CDGC) strategy, which enhances orienting accuracy and robustness while reducing computational overhead to one-eighth of previously isolated strategies. Evaluated on the real-world dataset RIST, the proposed vSTMD and its feedback-facilitated variant vSTMD-F achieve relative F1 gains of 30\% and 58\% over state-of-the-art (SOTA) STMD approaches, respectively. Furthermore, both models demonstrate competitive orientation estimation performance compared to SOTA deep learning-driven methods. Experiments also reveal the superiority of the natural architecture for ET-object motion detection - vSTMD is 60× faster than contemporary data-driven methods, making it highly suitable for real-time applications in dynamic scenarios and complex backgrounds. Code is available at https://github.com/MingshuoXu/vSTMD.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.