Turning Citation Networks Inside Out: Studying Science Using Content-Based Knowledge Graphs from LLM-Derived Taxonomies

Abstract

Scientific fields are often mapped using citations and metadata, despite knowledge being transmitted primarily through content. We introduce an 'inside-out' approach that reconstructs field structure directly from text by representing each paper as a small set of interpretable knowledge components. Using a large language model to induce domain-specific taxonomies and label papers, each publication is encoded as a triplet of measure, data type, and research-question type. These triplets define a knowledge graph with edges weighted by shared papers. Applied to 617 studies on intergenerational wealth mobility, the graph reveals a stable methodological backbone centered on regression-based mobility measures, alongside substantial temporal variation in component recombination. We further utilize normalized betweenness-to-connectivity ratios to identify components and pairings that act as structural bridges disproportionate to their prevalence. This content-derived, taxonomy-driven mapping complements citation-based approaches by exposing the evolving architecture of methods, data, and questions that define a field.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…