Research Index

Join the Menttor community

Access accelerated AI inference, track progress, and collaborate on roadmaps with students worldwide.

🐢

Research Decoded/Yao et al. (2022)

#ReAct: Reason + Act

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K., & Cao, Y. (2022). ReAct: Synergizing reasoning and acting in language models. arXiv preprint arXiv:2210.03629.

Read Original Paper

The 2022 'ReAct' paper introduced a prompting framework that allows Large Language Models to interleave reasoning traces with task-specific actions. While previous models either focused on internal reasoning (Chain-of-Thought) or external acting (Web-browsing), researchers at Princeton and Google DeepMind showed that combining the two leads to a synergistic effect where the model uses 'thoughts' to plan and 'actions' to ground its reasoning in reality. It was a shift from viewing a model as a static black box to viewing it as an agentic system capable of dynamic, multi-step interaction with its environment.

#The Thought-Action-Observation Loop

The ReAct loop: interleaving reasoning traces (thoughts) with external actions and observations.

The technical shift in ReAct was the formalization of an agentic loop consisting of three distinct steps: Thought, Action, and Observation. In this cycle, the model first generates a 'thought'—a free-form text string used to decompose a goal or track progress. It then outputs an 'action,' which is a specific command for an external environment, such as searching a database or moving an object in a virtual space. Finally, it receives an 'observation' from that environment and incorporates it back into its context window. This structure allows the model to 'reason to act' and 'act to reason,' creating a more robust decision-making process than either pure reasoning or pure acting could achieve alone. It revealed that the most effective way to solve complex, multi-step problems is to treat language as a medium for both internal reflection and external interface.

#Grounded Reasoning and Hallucination

The reasoning behind ReAct was to solve the persistent problem of hallucination in Large Language Models. In fact-heavy tasks like HotpotQA, traditional Chain-of-Thought models often 'make up' facts because they rely solely on their internal, static weights. ReAct forces the model to be 'grounded' by requiring it to retrieve specific sentences from Wikipedia before making a claim. A manual study showed that over half of the errors in standard reasoning models were caused by hallucination, whereas ReAct's grounded approach significantly reduced these failures. This proved that a model's accuracy is not just a function of its size, but of its ability to verify its own internal state against an external source of truth. It suggested that true intelligence requires the humility to look for more information when internal confidence is low.

#Dynamic Reasoning-Driven Retrieval

A critical technical detail is ReAct's use of 'dynamic reasoning-driven retrieval,' which differs from standard Retrieval-Augmented Generation (RAG). In a standard RAG system, documents are fetched once at the beginning of the query. In ReAct, the model reads a small snippet, reasons about what is still missing, and then formulates a new, more specific search query to find the next piece of the puzzle. This allows for 'multi-hop' reasoning, where the answer to one question leads the model to ask a second, more complex question. This iterative process proved that the power of an LLM is not in its ability to memorize a database, but in its ability to navigate one. It raised the question of whether the future of knowledge retrieval lies in better search algorithms or in better agents that know how to use the existing ones.

#Scaling Decision Making

ReAct's success was also demonstrated in complex decision-making environments like ALFWorld, a text-based game requiring long-horizon planning. While simpler models often get stuck in repetitive loops or fail to plan more than one step ahead, ReAct uses 'sparse thoughts' to apply commonsense reasoning to its actions—such as deciding to check the fridge for lettuce before looking in the bedroom. This approach led to a 34% absolute success rate improvement over non-reasoning baselines. It revealed that 'intelligence' in an agent is the ability to maintain a coherent internal plan while being flexible enough to react to unexpected feedback. It remains to be seen if this loop can be scaled to even more complex, real-world physical environments without a significant increase in computational latency.

#Dive Deeper

ReAct Project Page
Princeton • docs
Explore Resource
ReAct Paper on arXiv
arXiv • article
Explore Resource