Author ORCID Identifier

Kexin Zhao: 0009-0000-6658-6727

Jamie C. Macbeth: 0000-0003-3474-203X

Document Type

Article

Publication Date

12-2025

Publication Title

Cognitive Systems Research

Abstract

The capabilities of large language models (LLMs) have rarely been assessed against those of classical, symbolic AI systems for natural language generation and natural language understanding. This paper assesses the understanding and reasoning capabilities of a large language model by probing it with SHRDLU, a rule-based, symbolic natural language understanding system that features a human user issuing commands to a robot which grasps and moves objects in a virtual “blocks world” environment. We perform a study in which we prompt an LLM with SHRDLU human-robot interaction dialogs and simple questions about the locations of objects at the conclusion of the dialog. In these tests of GPT-4’s understanding of spatial and containment relationships and its ability to reason about complex scenarios involving object manipulation, we find that GPT-4 performs well with basic tasks but struggles with complex spatial relationships and object tracking, with an accuracy as low as 16 % in certain conditions with longer dialogs. Although GPT-4, a state of the art LLM, appears to be no match for SHRDLU, one of the earliest natural language understanding systems, this study is an important initial step towards future systems which may achieve the best of both neural and symbolic worlds.

Keywords

Blocks worldNatural language understanding, SHRDLU, Large language models

Volume

94

First Page

101421

DOI

https://doi.org/10.1016/j.cogsys.2025.101421

Creative Commons License

Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License
This work is licensed under a Creative Commons Attribution-Noncommercial-No Derivative Works 3.0 License.

Version

Author's Accepted Manuscript

Comments

Link to read published article online

Available for download on Wednesday, December 01, 2027

Share

COinS