The Significance of User Characteristics for Reposting Prediction on X: A Comparative Analysis Under Distribution Shift

Abstract

Understanding information diffusion on X (formerly Twitter) requires accurate modelling of reposting behaviour. Most existing work predicts reposting under in-distribution settings, where training and test data cover the same topics. This paper addresses a more realistic and challenging scenario: out-of-distribution prediction, i.e., forecasting reposting behaviour for new, previously unseen topics. We formulate the task at the individual level - predicting whether a specific user will repost a given post - and systematically compare the predictive power of post-related features, user-related features, and their combination across four representative models: Decision Tree, Multi-Layer Perceptron, BERT, and Qwen. Our experiments show that while post-related features perform well in-distribution, their performance declines drastically for unseen topics, with F1 scores falling to approximately 0.12. In contrast, user-related features - including user profiles, social relations, and historical behaviour - deliver strong and transferable performance, raising the F1 score to over 0.70. These results demonstrate that reposting decisions are largely content-agnostic: they are driven more by stable user characteristics than by the specific content of a post. Our findings highlight the value of user modelling for building robust prediction systems and provide new insights into the mechanisms that enable information to spread across different topics.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…