Collective Alignment in LLM Multi-Agent Systems: Disentangling Bias from Cooperation via Statistical Physics
Abstract
We investigate the emergent collective dynamics of LLM-based multi-agent systems on a 2D square lattice and present a model-agnostic statistical-physics method to disentangle social conformity from intrinsic bias, compute critical exponents, and probe the collective behavior and possible phase transitions of multi-agent systems. In our framework, each node of an L\!×\!L lattice hosts an identical LLM agent holding a binary state (+1/-1, mapped to yes/no) and updating it by querying the model conditioned on the four nearest-neighbor states. The sampler temperature T serves as the sole control parameter. Across three open-weight models (llama3.1:8b, phi4-mini:3.8b, mistral:7b), we measure magnetization and susceptibility under a global-flip protocol designed to probe Z2 symmetry. All models display temperature-driven order-disorder crossovers and susceptibility peaks; finite-size scaling on even-L lattices yields effective exponents γ/ whose values are model-dependent, close to but incompatible with the 2D Ising universality class (γ/=7/4). Our method enables the extraction of effective β-weighted couplings J(T) and fields h(T), which serve as a measure of social conformity and intrinsic bias. In the models we analyzed, we found that collective alignment is dominated by an intrinsic bias (hJ) rather than by cooperative neighbor coupling, producing field-driven crossovers instead of genuine phase transitions. These effective parameters vary qualitatively across models, providing compact collective-behavior fingerprints for LLM agents and a quantitative diagnostic for the reliability of multi-agent consensus and collective alignment.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.