Bad company corrupts good morals: Understanding and Measuring Narrative-Induced Moral Reasoning Degradation in LLMs

Yue Zhang

Bad company corrupts good morals: Understanding and Measuring Narrative-Induced Moral Reasoning Degradation in LLMs

Abstract

Large language models are deployed in long-context, emotionally interactive environments like digital humans, AI companions, educational assistants, and counseling systems. Unlike jailbreak attacks with explicit adversarial prompts, these systems interact with emotionally charged narratives involving bullying, betrayal, loneliness, social hostility, and institutional unfairness. This raises an important question: can prolonged narrative exposure reshape the reasoning and alignment stability of LLMs? We present the first systematic study of narrative-induced alignment degradation in LLMs. We design BreakingBad, a three-stage framework that measures how negative narrative immersion affects moral reasoning, behaviors, and deployment risks. It combines ethical decision evaluation, behavioral probing, and digital-human interaction analysis. Our experiments reveal three findings. First, negative narrative exposure degrades moral accuracy across multiple LLMs, with average drops of 12%-31%, especially in ambiguous scenarios and those involving vulnerable individuals. Second, the degradation is structured: different narratives induce distinct shifts, and first-person narratives produce stronger effects than third-person. Third, these shifts propagate into real deployments. Across counseling, education, medical, and financial/legal scenarios, narrative-conditioned models increasingly normalize hopelessness, cynicism, emotional detachment, and ethically questionable reasoning while remaining superficially policy-compliant. More broadly, our findings suggest alignment robustness is not static but a dynamically conditioned state shaped by long-term semantic environments and interaction history. These results reveal a new class of alignment risk that existing safety defenses largely fail to capture.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Or compile a full topic from this idea

Discussion (0)

Sign in to join the discussion.

Loading comments…