ReAct: Synergizing Reasoning and Acting in Language Models

์ €์ž: Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, Yuan Cao | ๋‚ ์งœ: 2022-10-06 | URL: https://arxiv.org/abs/2210.03629 📄 PDF


Essence

Figure 1

Figure 1: (1) Comparison of 4 prompting methods, (a) Standard, (b) Chain-of-thought (CoT,

ReAct๋Š” ๋Œ€ํ˜• ์–ธ์–ด ๋ชจ๋ธ์ด reasoning trace์™€ task-specific action์„ interleaved manner๋กœ ์ƒ์„ฑํ•˜๋„๋ก ํ•จ์œผ๋กœ์จ, ์ถ”๋ก ๊ณผ ํ–‰๋™์˜ ์‹œ๋„ˆ์ง€๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ์–ธ์–ด ์ดํ•ด ๋ฐ ์˜์‚ฌ๊ฒฐ์ • ํƒœ์Šคํฌ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: (1) Comparison of 4 prompting methods, (a) Standard, (b) Chain-of-thought (CoT,

How

Figure 1

Figure 1: (1) Comparison of 4 prompting methods, (a) Standard, (b) Chain-of-thought (CoT,

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ReAct๋Š” LLM์˜ ์ถ”๋ก ๊ณผ ํ–‰๋™์„ ํš๊ธฐ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ hallucination์„ ์ค„์ด๊ณ  ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ์„ ๋†’์ด๋Š” ์ค‘์š”ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. Few-shot prompting๋งŒ์œผ๋กœ ๋Œ€๊ทœ๋ชจ ํ•™์Šต ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์„ ๋›ฐ์–ด๋„˜๋Š” ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๊ด‘๋ฒ”์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ์˜ ๊ฒ€์ฆ๊ณผ ๋ช…ํ™•ํ•œ ์ œ์‹œ๋กœ ๋†’์€ ์˜ํ–ฅ๋ ฅ์„ ๊ฐ€์งˆ ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
LLM์˜ ํˆด ํ†ตํ•ฉ ๊ธฐ๋ฐ˜ ์ž๊ธฐ๊ฒ€์ฆ๊ณผ ์ž์—ฐ์–ด ํ”ผ๋“œ๋ฐฑ ํ™œ์šฉ์ด ReAct ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ทผ๊ฐ„์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct ํ”„๋ ˆ์ž„์›Œํฌ๋Š” reasoning-acting ๋™์‹œ ์ œ์–ด์™€ ์™ธ๋ถ€ ๋„๊ตฌ ์—ฐ๋™์œผ๋กœ, ์ฆ๊ฐ• LLM ๋Šฅ๋ ฅ์˜ ์‹ค์งˆ์  ๊ตฌํ˜„ ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct๋Š” ์ถ”๋ก ๊ณผ ํ–‰๋™์„ ๊ฒฐํ•ฉํ•œ LLM์˜ ์‹œ๋„ˆ์ง€๋ฅผ ๋‹ค๋ฃจ๋ฉฐ, MLCopilot์˜ ์ธ๊ฐ„ ์œ ์‚ฌ ๋ฌธ์ œํ•ด๊ฒฐ ํ”Œ๋กœ์šฐ์— ๊ทผ๊ฐ„์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Reasoning๊ณผ ๋„๊ตฌ ์‚ฌ์šฉ์„ ํ†ตํ•ฉํ•œ ReAct ํ”„๋ ˆ์ž„์›Œํฌ ์†Œ๊ฐœ๋กœ TREE-PLANNER์˜ ๊ตฌ์กฐ์  ๋ฐฐ๊ฒฝ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
655๋ฒˆ ReAct ๋…ผ๋ฌธ์€ LLM์ด reason+act๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ํ”„๋กœ๊ทธ๋žจ์ ์œผ๋กœ ์™ธ๋ถ€ ๋„๊ตฌ๋ฅผ ํ™œ์šฉํ•˜๋Š” ๊ณจ๊ฒฉ์  ๋ฐฉ์‹์„ ์ œ๊ณตํ•ด, 813๋ฒˆ ์Šค์Šค๋กœ ๋„๊ตฌ ์‚ฌ์šฉ๋ฒ•์„ ์ตํžˆ๋Š” ๋ฐฉ๋ฒ•๋ก ์˜ ์ด๋ก ์  ํ† ๋Œ€๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Lean-star ๋…ผ๋ฌธ์€ ์‚ฌ๊ณ ์™€ ํ–‰๋™์˜ ๊ต์ฐจ์ƒ์„ฑ(MR)๊ณผ ์ฆ๋ถ„์  ์ถ”๋ก ์„ LLM์— ์ ์šฉํ•˜๋Š” ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct ํ”„๋ ˆ์ž„์›Œํฌ๋Š” LLM์˜ ํˆด ์‚ฌ์šฉ๊ณผ ์ฒด์ธ ์˜ค๋ธŒ ์†ŒํŠธ ์ถ”๋ก ์„ ๊ฒฐํ•ฉํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct ํ”„๋กฌํ”„ํŠธ์™€ ๋„๊ตฌ ํ˜ธ์ถœ ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ์›๋ฆฌ๋ฅผ ์†Œ๊ฐœํ•˜๋Š” ๋Œ€ํ‘œ ๋…ผ๋ฌธ์œผ๋กœ, ๋„๋ฉ”์ธ ํŠนํ™” ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์•„์ด๋””์–ด ๋ฐฐ๊ฒฝ์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct๋Š” ํˆด ์‚ฌ์šฉ ๋Šฅ๋ ฅ์„ ์ฆ๊ฐ•์‹œํ‚ค๋Š” LLM ์„ค๊ณ„ ๋ฐฉ์‹์œผ๋กœ, ChemToolAgent์˜ ๋„๊ตฌ์ฆ๊ฐ• ์ ‘๊ทผ์— ํ•ต์‹ฌ์ ์ธ ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct(655) ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ๊ฐ€ DrugAgent์˜ reasoning-acting ์‹œ๋„ˆ์ง€ ๋ชจ๋ธ๋ง์—์„œ ๊ทผ๊ฐ„์„ ํ˜•์„ฑํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
655๋ฒˆ์˜ ReAct ๋ฐฉ์‹์€ LLM์˜ ์ถ”๋ก  ๊ณผ์ •์— ํ–‰๋™(๊ฒ€์ƒ‰ ๋“ฑ)์„ ํ†ตํ•ฉํ•˜์—ฌ, 447๋ฒˆ ExSearch ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ทผ๊ฐ„์ด ๋˜๋Š” ๊ฐœ๋…์ž…๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
LLM์—์„œ ์ถ”๋ก ๊ณผ ํ–‰๋™(verification, reasoning) ๊ฒฐํ•ฉ ์›๋ฆฌ์˜ ๊ธฐ๋ฐ˜์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct๋Š” LLM์˜ reasoning๊ณผ ํ–‰๋™(action) ๊ฒฐํ•ฉ์ด๋ผ๋Š” ReSearch์˜ ๊ธฐ๋ฐ˜ ์ฒด๊ณ„๋ฅผ ์ œ์•ˆํ•˜์—ฌ ๋งฅ๋ฝ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReTool ํ”„๋ ˆ์ž„์›Œํฌ์˜ reasoning-tools ์—ฐ๊ณ„ ๋ฐ ReAct ํŒจ๋Ÿฌ๋‹ค์ž„์˜ ์žฅ์ ๊ณผ ํ•œ๊ณ„๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ปค๋ฆฌํ˜๋Ÿผ RL์—์„œ ์ถ”๋ก ์  ํ–‰์œ„ ์กฐํ•ฉ ๋ฐ ReAct ๋ฐฉ์‹ ์ ์šฉ์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM์ด ์‚ฌ์šฉ์ž ์˜๋„๋ฅผ ํŒŒ์•…ํ•˜๊ธฐ ์œ„ํ•œ ๋Œ€์•ˆ์  ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
655(ReAct)๋Š” LLM์˜ reasoning-acting ๊ฒฐํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜์—ฌ, 498์˜ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์ถ”๋ก  capability ํ‰๊ฐ€์™€ ๋ณด์™„์  ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ReAct ํ”„๋ ˆ์ž„์›Œํฌ๋Š” reasoning and acting์˜ ๊ฒฐํ•ฉ์„ ํ†ตํ•ด symbolic world ๋ชจ๋ธ ์ƒ์„ฑ ๊ฐ€๋Šฅ์„ฑ์˜ ๋‹ค๋ฅธ ํƒ์ƒ‰ ๋ฐฉ๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Self-Refine ๋…ผ๋ฌธ์€ self-feedback ๋ฐ ๋ฐ˜๋ณต์  ์ž์ฒด ๊ฐœ์„  ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ReAct์˜ ๋ฐฉ๋ฒ•๋ก ์„ ํ™•์žฅํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
742๋ฒˆ ๋…ผ๋ฌธ์€ ์ฒด์ธ ์˜ค๋ธŒ ์“ฐ๋กฏ๊ณผ ์—์ด์ „ํŠธ ํ˜‘์—…์œผ๋กœ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ ์ž๋™ํ™”๋ฅผ ๋‹ค๋ฃจ์–ด, 655๋ฒˆ์˜ reasoning-acting ์‹œ๋„ˆ์ง€ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‹ค์ œ ๋…ผ๋ฌธ ์‹ฌ์‚ฌ์— ํ™•์žฅ ์ ์šฉํ•œ ์‚ฌ๋ก€๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
MLCopilot์€ ReAct๊ฐœ์˜ reasoning-action ๊ฒฐํ•ฉ ๋ชจ๋ธ ๊ตฌ์กฐ๋ฅผ ์ž๋™ํ™”๋œ ML ์‹คํ—˜์— ์‘์šฉํ•œ ์‹ค์ œ ์‚ฌ๋ก€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
286๋ฒˆ ๋…ผ๋ฌธ์€ ์ž์—ฐ๊ณผํ•™ ๋ชจ๋ธ๋ง์— ํŠนํ™”๋œ domain-specific ReAct ํ™•์žฅ์„ ์ œ์•ˆํ•ด, 655๋ฒˆ์˜ ์›๋ฆฌ๋ฅผ ๋‹ค์–‘ํ•œ ๊ณผํ•™ ๋ฌธ์ œ์— ์ ์šฉํ•œ ์‹ค์˜ˆ๋ฅผ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •