Reasoning Abilities of Large Language Models: In-Depth Analysis on the Abstraction and Reasoning Corpus

Abstract

The existing methods for evaluating the inference abilities of Large Language Models (LLMs) have been predominantly results-centric, making it challenging to assess the inference process comprehensively. We introduce a novel approach using the Abstraction and Reasoning Corpus (ARC) benchmark to evaluate the inference and contextual understanding abilities of LLMs in a process-centric manner, focusing on three key components from the Language of Thought Hypothesis (LoTH): Logical Coherence, Compositionality, and Productivity. Our carefully designed experiments reveal that while LLMs demonstrate some inference capabilities, they still significantly lag behind human-level reasoning in these three aspects. The main contribution of this paper lies in introducing the LoTH perspective, which provides a method for evaluating the reasoning process that conventional results-oriented approaches fail to capture, thereby offering new insights into the development of human-level reasoning in artificial intelligence systems.

0

Turn this paper into a full lesson

ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.

Discussion (0)

Sign in to join the discussion.

Loading comments…