See and Switch: Vision-Based Branching for Interactive Robot-Skill Programming
Abstract
Programming by demonstration (PbD) makes robot programming accessible to non-experts, but scaling it to real-world variability remains a challenge for current teaching frameworks, especially when a robot must select suitable task variants online from visual input. We present See & Switch, an interactive teaching-and-execution framework that represents tasks as graphs of skill parts connected by decision states, enabling conditional branching during replay. Its vision-based Switcher uses eye-in-hand images to select the appropriate successor skill part and detect novel situations that require new demonstrations. The framework supports recovery demonstrations during execution through kinesthetic teaching, joystick control, and hand gestures. We evaluate See & Switch on three dexterous manipulation tasks with 8 novice users, collecting approx. 900 real-robot execution rollouts. To isolate visual decision performance from timing errors during decision states, we evaluate the Switcher offline using user-gated decision state windows. In the evaluation within the decision state windows, the method achieves up to 90.6% branch-selection accuracy and detects anomalies with >90% accuracy in 47 of 79 decision states, demonstrating reliable switching based on visual input for conditional robot-skill programming. We provide all code and experiment data at http://imitrob.ciirc.cvut.cz/publications/seeandswitch.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.