Missing links prediction: comparing machine learning with physics-rooted approaches
Abstract
An active research line within the broader field of network science is the one concerning link prediction. Close in scope to network reconstruction, link prediction targets specific connections with the aim of uncovering the missing ones, as well as predicting those most likely to emerge in the future, from the available information. In this paper, we consider two families of methods, i.e. those rooted in statistical physics and those based upon machine learning: the members of the first family identify missing links as the most probable non-observed ones, the probability coefficients being determined by solving maximum-entropy benchmarks over the accessible network structure; the members of the second family, instead, associate the presence of single edges to explanatory node-specific variables. Running likelihood-based models such as the Configuration Model, or one of its many fitness-based variants, in parallel with the Gradient Boosting Decision Tree algorithm reveals that the former's accuracy is comparable to (and sometimes slightly higher than) the latter's. Such a result confirms that white-box algorithms are viable competitors to the currently available black-box ones, being computationally faster and more interpretable than the latter.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.