COMPASS: Complete Multimodal Fusion via Proxy Tokens and Shared Spaces for Ubiquitous Sensing

Abstract

Missing modalities in multimodal sensing cause not only information loss but also a fusion-interface mismatch: a fusion head trained on a canonical set of modality slots must operate on changing observed subsets at inference time. We propose Compass, an interface-complete fusion framework that restores this canonical slot structure before prediction. Each modality is assigned a fixed fusion slot. Observed modalities populate their slots with real representations, while absent modalities are filled with target-slot completion representations estimated from the observed sources. Multiple source-specific estimates for the same missing slot are aggregated into a single slot filler, allowing the same lightweight fusion operator to be applied under arbitrary missing-modality patterns. Training uses synthetic modality masking, slot-compatibility supervision, and representation-space stabilization to make completed slots compatible with real modality representations and useful for downstream recognition. Across XRF55, MM-Fi, and OctoNet, Compass improves robustness under diverse single- and multiple-missing settings, including controlled comparisons against imputation, distillation, and translation-style baselines. These results suggest that preserving the fusion interface is a simple and effective principle for robust multimodal sensing.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…