"Mental Rotation" by Optimizing Transforming Distance

Abstract

The human visual system is able to recognize objects despite transformations that can drastically alter their appearance. To this end, much effort has been devoted to the invariance properties of recognition systems. Invariance can be engineered (e.g. convolutional nets), or learned from data explicitly (e.g. temporal coherence) or implicitly (e.g. by data augmentation). One idea that has not, to date, been explored is the integration of latent variables which permit a search over a learned space of transformations. Motivated by evidence that people mentally simulate transformations in space while comparing examples, so-called "mental rotation", we propose a transforming distance. Here, a trained relational model actively transforms pairs of examples so that they are maximally similar in some feature space yet respect the learned transformational constraints. We apply our method to nearest-neighbour problems on the Toronto Face Database and NORB.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…