Supply Chain Propagation of Textual Signals: LLM Embeddings and Cross-Sectional Return Predictability

Abstract

This paper proposes a novel asset pricing framework that augments large language model (LLM) embeddings of annual report disclosures with supply chain knowledge graph (KG) propagation. Using FinBERT embeddings of 10-K MD&A sections for 255 S&P 500 firms over 2011-2025, two sets of return predictors are constructed: direct LLM embeddings and network-augmented embeddings, where firm-level signals propagate through inter-firm linkages. Fama-MacBeth cross-sectional regressions reveal that the network-augmented factor (netpc5) carries significant return predictability with a Newey-West t-statistic of -2.64, even after controlling for momentum, volatility, and firm size. A long-short portfolio sorted on netpc5 achieves an annualized Sharpe ratio of 0.86 and a Fama-French five-factor alpha of 7.27% per year (t = 2.30). The predictive power survives out-of-sample tests, placebo experiments, sector-neutralization, and subsample analysis. The findings suggest that inter-firm network structure contains pricing-relevant information beyond firm-level textual disclosures.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…