Unsupervised Machine Learning of Open Source Russian Twitter Data Reveals Global Scope and Operational Characteristics
Abstract
We developed and used a collection of statistical methods (unsupervised machine learning) to extract relevant information from a Twitter supplied data set consisting of alleged Russian trolls who (allegedly) attempted to influence the 2016 US Presidential election. These unsupervised statistical methods allow fast identification of (i) emergent language communities within the troll population, (ii) the transnational scope of the operation and (iii) operational characteristics of trolls that can be used for future identification. Using natural language processing, manifold learning and Fourier analysis, we identify an operation that includes not only the 2016 US election, but also the French National and both local and national German elections. We show the resulting troll population is composed of users with common, but clearly customized, behavioral characteristics.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.