Inner Monologue: Embodied Reasoning through Planning with Language Models

์ €์ž: Wenlong Huang, Fei Xia, Ted Xiao, Harris Chan, Jacky Liang, Pete Florence, Andy Zeng, Jonathan Tompson, Igor Mordatch, Yevgen Chebotar, Pierre Sermanet, Noah Brown, Tomas Jackson, Linda Luu, Sergey Levine, Karol Hausman, Brian Ichter | ๋‚ ์งœ: 2022-07-12 | URL: https://arxiv.org/abs/2207.05608 📄 PDF


Essence

Figure 1

Figure 1: Inner Monologue enables grounded closed-loop feedback for robot planning with large language models

LLM์„ ๋กœ๋ด‡ ์ œ์–ด์— ํ™œ์šฉํ•  ๋•Œ, ํ™˜๊ฒฝ ํ”ผ๋“œ๋ฐฑ์„ ์ž์—ฐ์–ด๋กœ ์ฃผ์ž…ํ•˜์—ฌ LLM์ด '๋‚ด์  ๋…๋ฐฑ(inner monologue)'์„ ํ˜•์„ฑํ•˜๊ฒŒ ํ•จ์œผ๋กœ์จ ํ๋ฃจํ”„ ๊ณ„ํš ๋ฐ ์ถ”๋ก ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค. ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด ํ”„๋กฌํ”„ํŒ…๋งŒ์œผ๋กœ ๋ณต์žกํ•œ ์žฅ๊ธฐ ์กฐ์ž‘ ์ž‘์—…์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์ค€๋‹ค.

Motivation

Achievement

Figure 3

Figure 3: Different instantiations of Inner Monologue in three distinct domains โ€“ simulated tabletop rearrangement (top)

How

Figure 2

Figure 2: Various types of textual feedback. Success Detection gives task-specific task completion information, Passive

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ๊ณ„ํš์— ํ๋ฃจํ”„ ํ”ผ๋“œ๋ฐฑ์„ ์ž์—ฐ์–ด๋กœ ํ†ตํ•ฉํ•˜๋Š” ์ฐฝ์˜์ ์ด๊ณ  ์‹ค์šฉ์ ์ธ ์ ‘๊ทผ์„ ์ œ์‹œํ•˜๋ฉฐ, ์ถ”๊ฐ€ ํ•™์Šต ์—†์ด๋„ ๋ณต์žกํ•œ ์‹ค์ œ ์ž‘์—…์„ ์ˆ˜ํ–‰ ๊ฐ€๋Šฅํ•จ์„ ๋‹ค์ˆ˜์˜ ์‹คํ—˜์œผ๋กœ ์ž…์ฆํ–ˆ๋‹ค. ๋‹ค๋งŒ perception ํ”ผ๋“œ๋ฐฑ์˜ ํ’ˆ์งˆ ์˜์กด์„ฑ๊ณผ LLM์˜ ๊ณ ๋น„์šฉยท์ง€์—ฐ ๋ฌธ์ œ๊ฐ€ ์ถ”ํ›„ ๊ฐœ์„  ๊ณผ์ œ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •