Auto-Encoding for Shared Cross Domain Feature Representation and Image-to-Image Translation
Abstract
Image-to-image translation is a subset of computer vision and pattern recognition problems where our goal is to learn a mapping between input images of domain X1 and output images of domain X2. Current methods use neural networks with an encoder-decoder structure to learn a mapping G:X1 X2 such that the distribution of images from X2 and G(X1) are identical, where G(X1) = dG (fG (X1)) and fG (·) is referred as the encoder and dG(·) is referred to as the decoder. Currently, such methods which also compute an inverse mapping F:X2 X1 use a separate encoder-decoder pair dF (fF (X2)) or at least a separate decoder dF (·) to do so. Here we introduce a method to perform cross domain image-to-image translation across multiple domains using a single encoder-decoder architecture. We use an auto-encoder network which given an input image X1, first computes a latent domain encoding Zd = fd (X1) and a latent content encoding Zc = fc (X1), where the domain encoding Zd and content encoding Zc are independent. And then a decoder network g(Zd,Zc) creates a reconstruction of the original image X1=g(Zd,Zc )≈ X1. Ideally, the domain encoding Zd contains no information regarding the content of the image and the content encoding Zc contains no information regarding the domain of the image. We use this property of the encodings to find the mapping across domains G: X Y by simply changing the domain encoding Zd of the decoder's input. G(X1 )=d(fd (x2i ),fc (X1)) where x2i is the ith observation of X2.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.