Learning to Diagnose and Correct Errors: Towards Moral Sensitivity Acquisition in Large Language Models

Abstract

Moral sensitivity is the most fundamental capability underlying human moral competence. Although many approaches aim to align large language models (LLMs) with human moral values, they primarily focus on fitting the distributions of morally appropriate texts while overlooking how to enable moral sensitivity acquisition in LLMs. In this paper, we take a step toward addressing the question: How can moral sensitivity be acquired in LLMs? Specifically, we propose a pragmatic inference approach that facilitates moral sensitivity acquisition in LLMs by enabling them to diagnose and correct moral errors. A central strength of our pragmatic inference approach lies in its unified perspective: rather than modeling moral discourses across semantically diverse and complex surface forms, it provides a principled framework for designing pragmatic inference procedures grounded in their inferential load. Empirical evidence demonstrates that our pragmatic approach can enable moral sensitivity acquisition in LLMs and generalizes effectively across tasks.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…