Document Type
Article
Publication Date
12-2025
Publication Title
Cognitive Systems Research
Abstract
The capabilities of large language models (LLMs) have rarely been assessed against those of classical, symbolic AI systems for natural language generation and natural language understanding. This paper assesses the understanding and reasoning capabilities of a large language model by probing it with SHRDLU, a rule-based, symbolic natural language understanding system that features a human user issuing commands to a robot which grasps and moves objects in a virtual “blocks world” environment. We perform a study in which we prompt an LLM with SHRDLU human-robot interaction dialogs and simple questions about the locations of objects at the conclusion of the dialog. In these tests of GPT-4’s understanding of spatial and containment relationships and its ability to reason about complex scenarios involving object manipulation, we find that GPT-4 performs well with basic tasks but struggles with complex spatial relationships and object tracking, with an accuracy as low as 16 % in certain conditions with longer dialogs. Although GPT-4, a state of the art LLM, appears to be no match for SHRDLU, one of the earliest natural language understanding systems, this study is an important initial step towards future systems which may achieve the best of both neural and symbolic worlds.
Keywords
Blocks worldNatural language understanding, SHRDLU, Large language models
Volume
94
First Page
101421
DOI
https://doi.org/10.1016/j.cogsys.2025.101421
Creative Commons License

This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.
Version
Author's Accepted Manuscript
Recommended Citation
Zhao, Kexin and Macbeth, Jamie C., "Probing the Reasoning Abilities of LLMs in Blocks World" (2025). Computer Science: Faculty Publications, Smith College, Northampton, MA.
https://scholarworks.smith.edu/csc_facpubs/426

Comments
Link to read published article online