The Undecidability of Artificial General Intelligence (AGI) Alignment

Abstract

This article establishes the foundational mathematical limits of Artificial General Intelligence (AGI) safety, proving that the core barrier is not the impossibility of an aligned state, but its structural unverifiability. We formalize this boundary through two central impossibility results: the Unverifiability Theorem of Alignment and the Theorem of Finite Structural Unverifiability of AGI Alignment. We ground this boundary at Trakhtenbrot's Wall, demonstrating that contemporary engineering defenses relying on finite hardware or halting architectures fail to escape logical obstructions. This failure manifests as an inescapable triad of containment failures: open domains yield fundamental undecidability (Rice and Gödel); universal finite verification collapses into algorithmic incomputability (Trakhtenbrot); and particular bounded environments trap the supervisor within intractable bounds in the worst case. As a direct structural corollary of these results, we derive the Soundness--Completeness--Tractability Trilemma, establishing that the mutual incompatibility of these three properties is a necessary consequence of descriptive complexity rather than an empirical anomaly. Finally, we map these theoretical bounds onto practical AI engineering, demonstrating that modern containment strategies are not temporary patches, but mandatory sacrifices of logical expressivity required to secure decidable fragments of safety.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…