MOTOR: Learning ID-free Item Representation with Token Crossing for Embedding-based Multimodal Recommendation
Abstract
While multimodal recommendation models have effectively integrated visual and textual information, their reliance on unique ID embeddings constitutes a fundamental performance bottleneck. Specifically, ID-based paradigms suffer from three limitations: (1) Information Isolation, where unique IDs prevent semantic information exchange among related items; (2) Cold-Start Vulnerability, as ID embeddings are difficult to optimize with sparse interactions; and (3) Storage Inefficiency, where parameter costs scale linearly with item quantity. To overcome these challenges, we propose MOTOR, a novel ID-free MultimOdal TOken Representation scheme. MOTOR replaces explicit item IDs with learnable, shared multimodal tokens, fundamentally transforming the recommender into an ID-free framework. Methodologically, we first employ product quantization to discretize raw multimodal features into compact token IDs. These tokens serve as implicit item features, which are then synthesized via a novel Token Cross Network (TCN) to capture high-order interaction patterns. This "discretize-and-interact" mechanism enables semantic sharing across items and significantly compresses the model size without introducing complex auxiliary losses. Extensive experiments across nine mainstream models demonstrate the significant performance improvement achieved by MOTOR. Further, MOTOR improves the capability of these models to recommend items in cold-start scenarios.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.