Training Neural Machine Translation (NMT) Models using Tensor Train Decomposition on TensorFlow (T3F)

Abstract

We implement a Tensor Train layer in the TensorFlow Neural Machine Translation (NMT) model using the t3f library. We perform training runs on the IWSLT English-Vietnamese '15 and WMT German-English '16 datasets with learning rates ∈ \0.0004,0.0008,0.0012\, maximum ranks ∈ \2,4,8,16\ and a range of core dimensions. We compare against a target BLEU test score of 24.0, obtained by our benchmark run. For the IWSLT English-Vietnamese training, we obtain BLEU test/dev scores of 24.0/21.9 and 24.2/21.9 using core dimensions (2, 2, 256) × (2, 2, 512) with learning rate 0.0012 and rank distributions (1,4,4,1) and (1,4,16,1) respectively. These runs use 113\% and 397\% of the flops of the benchmark run respectively. We find that, of the parameters surveyed, a higher learning rate and more `rectangular' core dimensions generally produce higher BLEU scores. For the WMT German-English dataset, we obtain BLEU scores of 24.0/23.8 using core dimensions (4, 4, 128) × (4, 4, 256) with learning rate 0.0012 and rank distribution (1,2,2,1). We discuss the potential for future optimization and application of Tensor Train decomposition to other NMT models.

0

Turn this paper into a lesson

ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…