Estimating Perceptual Attributes of Haptic Textures Using Visuo-Tactile Data
Abstract
Accurate prediction of perceptual attributes of haptic textures is essential for advancing VR and AR applications and enhancing robotic interaction with physical surfaces. This paper presents a deep learning-based multi-modal framework, incorporating visual and tactile data, to predict perceptual texture ratings by leveraging multi-feature inputs. To achieve this, a four-dimensional haptic attribute space encompassing rough-smooth, flat-bumpy, sticky-slippery, and hard-soft dimensions is first constructed through psychophysical experiments, where participants evaluate 50 diverse real-world texture samples. A physical signal space is subsequently created by collecting visual and tactile data from these textures. Finally, a deep learning architecture integrating a CNN-based autoencoder for visual feature learning and a ConvLSTM network for tactile data processing is trained to predict user-assigned attribute ratings. This multi-modal, multi-feature approach maps physical signals to perceptual ratings, enabling accurate predictions for unseen textures. To evaluate predictive accuracy, we employed leave-one-out cross-validation to rigorously assess the model's reliability and generalizability against several machine learning and deep learning baselines. Experimental results demonstrate that the framework consistently outperforms single-modality approaches, achieving lower MAE and RMSE, highlighting the efficacy of combining visual and tactile modalities.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.