Parallel Knowledge Embedding with MapReduce on a Multi-core Processor
Abstract
This article firstly attempts to explore parallel algorithms of learning distributed representations for both entities and relations in large-scale knowledge repositories with MapReduce programming model on a multi-core processor. We accelerate the training progress of a canonical knowledge embedding method, i.e. translating embedding ( TransE) model, by dividing a whole knowledge repository into several balanced subsets, and feeding each subset into an individual core where local embeddings can concurrently run updating during the Map phase. However, it usually suffers from inconsistent low-dimensional vector representations of the same key, which are collected from different Map workers, and further leads to conflicts when conducting Reduce to merge the various vectors associated with the same key. Therefore, we try several strategies to acquire the merged embeddings which may not only retain the performance of entity inference, relation prediction, and even triplet classification evaluated by the single-thread TransE on several well-known knowledge bases such as Freebase and NELL, but also scale up the learning speed along with the number of cores within a processor. So far, the empirical studies show that we could achieve comparable results as the single-thread TransE performs by the stochastic gradient descend (SGD) algorithm, as well as increase the training speed multiple times via adapting the batch gradient descend (BGD) algorithm for MapReduce paradigm.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.