HumaniBench: A Human-Centric Framework for Large Multimodal Models Evaluation

Abstract

Although recent large multimodal models (LMMs) show impressive progress on vision language tasks, their alignment with human centered (HC) principles such as fairness, ethics, inclusivity, empathy, and robustness is often overlooked. Existing LMM benchmarks are largely accuracy-agnostic. We present HumaniBench, a unified framework for characterizing HC alignment across realistic, socially grounded visual contexts. It contains 32,000 expert-verified image-question pairs from real-world news imagery, each mapped to one or more HC principles through explicit metrics. Comparing 15 state of the art LMMs reveals consistent trade -offs: proprietary systems lead on ethics, reasoning, and empathy, while open-source models show superior visual grounding and resilience. All models show persistent gaps in fairness and multilingual inclusivity. Chain-of-thought prompting and test-time scaling yield 8to 12 % gains on several HC dimensions. HumaniBench enables fine-grained analysis of alignment trade-offs not captured by conventional multimodal benchmarks. https://vectorinstitute.github.io/humanibench/

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…