Target-depth sensing with metasurface-encoder integrated optoelectronic neural network

Abstract

Accurate and real-time sensing of targets in three-dimensional (3D) environments is essential for modern machine vision, underpinning emerging technologies such as autonomous systems, robotic manipulation, augmented reality, and intelligent surveillance. However, state-of-the-art 3D sensing approaches typically rely on complex postprocessing of multi-view images or LiDAR point clouds, resulting in considerable computational load, power consumption, and latency. To address these challenges, we propose a metasurface-encoder integrated optoelectronic neural network architecture that compresses 3D information into two-dimensional images by encoding depth using double-helix point spread function generated by a metasurface. The depth-encoded images are captured with a conventional monocular camera and subsequently processed by a lightweight shadow ResNet neural network. We experimentally validate the proposed architecture on the MNIST and Vehicle-Image datasets, achieving high accuracy simultaneously in target classification and depth estimation, thereby enabling real-time target tracking. The framework is readily extendable to other depth- or angle-encoding metasurfaces for multidimensional compression and detection. Our results demonstrate the effectiveness of the meta-optic-encoder/electronic-decoder paradigm in significantly reducing network complexity and computational burden while maintaining strong performance for smart vision sensory applications.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…