Identifying exogenous and endogenous activity in social media
Abstract
The occurrence of new events in a system is typically driven by external causes and by previous events taking place inside the system. This is a general statement, applying to a range of situations including, more recently, to the activity of users in Online social networks (OSNs). Here we develop a method for extracting from a series of posting times the relative contributions of exogenous, e.g. news media, and endogenous, e.g. information cascade. The method is based on the fitting of a generalized linear model (GLM) equipped with a self-excitation mechanism. We test the method with synthetic data generated by a nonlinear Hawkes process, and apply it to a real time series of tweets with a given hashtag. In the empirical dataset, the estimated contributions of exogenous and endogenous volumes are close to the amounts of original tweets and retweets respectively. We conclude by discussing the possible applications of the method, for instance in online marketing.
Turn this paper into a lesson
ArcXiv compiles a structured reading guide from this paper's metadata: plain-English importance, contributions, prerequisite concepts, which sections to read first, flashcards, and a quiz. Grounded in the abstract, never invented.