DriveFuture: Future-Aware Latent World Models for Autonomous Driving

Ziying Song

DriveFuture: Future-Aware Latent World Models for Autonomous Driving

Abstract

Existing latent world models for autonomous driving have opened a promising path toward future-aware driving intelligence. However, they typically treat future latent states as prediction targets or auxiliary signals, rather than directly conditioning trajectory planning. This can entangle current and future features in latent space. In this work, we propose DriveFuture, a future-aware latent world modeling framework for autonomous driving that explicitly learns planning-oriented foresight by conditioning the current latent state modeling process on future world states. Specifically, during training, the model first predicts future latent world states from the current latent state and ego action, and then refines the prediction against the ground-truth future latent state via cross-attention. The resulting future-aware latent serves as an explicit condition for a diffusion-based trajectory planner. During inference, DriveFuture conditions on the predicted future latent state instead of the ground-truth future state. DriveFuture achieves SOTA performance on the public NAVSIM benchmarks, reaching 55.5 EPDMS on NAVSIM-v2 bluenavhard, 89.9 EPDMS on NAVSIM-v2 bluenavtest, and 90.7 PDMS on NAVSIM-v1 bluenavtest, respectively. These results suggest that the key to latent world modeling lies not merely in simulating future states, but more importantly in conditioning current decision-making on future states. Notably, as of April 2026, DriveFuture ranks 1st on the https://huggingface.co/spaces/AGC2025/e2e-driving-navhardNAVSIM-v2 bluenavhard leaderboard and achieves SOTA performance on https://huggingface.co/spaces/AGC2024-P/e2e-driving-navtestNAVSIM-v1 bluenavtest.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…