Continual Vision-and-Language Navigation
Abstract
Developing Vision-and-Language Navigation (VLN) agents typically assumes a train-once-deploy-once strategy, which is unrealistic as deployed agents continually encounter novel environments. To address this, we propose the Continual Vision-and-Language Navigation (CVLN) paradigm, where agents learn and adapt incrementally across multiple scene domains. CVLN includes two setups: Initial-instruction based CVLN for instruction-following, and Dialogue-based CVLN for dialogue-guided navigation. We also introduce two simple yet effective baselines for sequential decision-making: Perplexity Replay (PerpR), which replays difficult episodes, and Episodic Self-Replay (ESR), which stores and revisits action logits during training. Experiments show that existing continual learning methods fall short for CVLN, while PerpR and ESR achieve better performance by efficiently utilizing replay memory.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.