Empirical Growing Networks vs Minimal Models: Evidence and Challenges from Software Heritage and APS Citation Datasets
Abstract
We investigate the evolution rules and degree distribution properties of the Software Heritage dataset, a large-scale growing network linking software source-code versions from open-source communities. The network spans more than 40 years and includes about 6 billion nodes and edges. Our analysis relies on deterministic temporal and topological partitions of nodes and edges, which account for the multilayer and partially timestamped structure of the main graph. We derive a temporal graph that reveals a mesoscale structure and enables the study of edge dynamics--creation, inheritance, and aging--together with comparisons to minimal models using degree distributions and histograms of edge timestamp differences. The temporal graph also exposes regime shifts that correlate with changes in developer practices, as reflected in the average number of edges per new node. We estimate scaling exponents under the scale-free hypothesis and highlight the sensitivity of the estimation method used to both regime shifts and outliers, while showing that partitioning improves regularity and helps disentangle these effects. We extend the analysis to the APS citation network, which also exhibits a major regime shift, with an accelerated growth regime becoming dominant after 1985. Although both datasets are a priori good candidates for advanced quantitative analysis, our results illustrate how structural and dynamical transitions hamper our ability to draw firm conclusions about the existence and observability of a scale-free regime in these empirical networks. These findings underscore the need for refined tools and models to study transient growth regimes, to extend current frameworks toward minimal causal growth models, and to enable robust comparisons between empirical growing networks and minimal models.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.