Compass vs Railway Tracks: Unpacking User Mental Models for Communicating Long-Horizon Work to Humans vs. AI
Abstract
As AI systems grow increasingly capable of operating for hours or days at a time, users' prompts are transforming into elaborate specifications for the AI to autonomously work on. While prompting for bounded, single-turn tasks has been extensively studied, less is known about how people communicate specifications for long-horizon tasks. We conducted a qualitative study in which 16 professionals drafted specifications for both a human colleague and an AI, revealing a core divergence: participants treated human delegation as a "compass", offering high-level intent to encourage flexible exploration. In contrast, communication with AI resembled painstakingly laying down "railway tracks": rigid, exhaustive instructions to minimize ambiguity and deviation. This reflected a perception that current AI struggles to infer intent, prioritize, and make judgments on its own. When envisioning an ideal AI collaborator, users desired a hybrid: a collaborator blending AI's efficiency and large context window with the critical thinking and agency of a human colleague. We discuss design implications for future AI systems, proposing that they align on outcomes through generated rough drafts, verify feasibility via end-to-end "test runs," and monitor execution through intelligent check-ins -- ultimately transforming AI from a passive instruction-follower into a reliable collaborator for ambiguous, long-horizon tasks.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.