The Significance of User Characteristics for Reposting Prediction on X: A Comparative Analysis Under Distribution Shift
Abstract
Understanding information diffusion on X (formerly Twitter) requires accurate modelling of reposting behaviour. Most existing work predicts reposting under in-distribution settings, where training and test data cover the same topics. This paper addresses a more realistic and challenging scenario: out-of-distribution prediction, i.e., forecasting reposting behaviour for new, previously unseen topics. We formulate the task at the individual level - predicting whether a specific user will repost a given post - and systematically compare the predictive power of post-related features, user-related features, and their combination across four representative models: Decision Tree, Multi-Layer Perceptron, BERT, and Qwen. Our experiments show that while post-related features perform well in-distribution, their performance declines drastically for unseen topics, with F1 scores falling to approximately 0.12. In contrast, user-related features - including user profiles, social relations, and historical behaviour - deliver strong and transferable performance, raising the F1 score to over 0.70. These results demonstrate that reposting decisions are largely content-agnostic: they are driven more by stable user characteristics than by the specific content of a post. Our findings highlight the value of user modelling for building robust prediction systems and provide new insights into the mechanisms that enable information to spread across different topics.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.