Embodied-Reasoner: Synergizing Visual Search, Reasoning, and Action for Embodied Interactive Tasks

์ €์ž: Wenqi Zhang, Mengna Wang, Gangao Liu, Xu Huixin, Yiwei Jiang, Yongliang Shen, Guiyang Hou, Zhe Zheng, Hang Zhang, Xin Li, Weiming Lu, Peng Li, Yueting Zhuang | ๋‚ ์งœ: 2025-03-27 | URL: https://arxiv.org/abs/2503.21696 📄 PDF


Essence

Figure 1

Figure 1.

o1 ์Šคํƒ€์ผ์˜ ์‹ฌ์ธต ์ถ”๋ก  ํŒจ๋Ÿฌ๋‹ค์ž„์„ embodied ์ธํ„ฐ๋ž™ํ‹ฐ๋ธŒ ์ž‘์—…์œผ๋กœ ํ™•์žฅํ•˜์—ฌ, ์‹œ๊ฐ ํƒ์ƒ‰, ์ถ”๋ก , ํ–‰๋™์„ ํ†ตํ•ฉํ•˜๋Š” Embodied-Reasoner ๋ชจ๋ธ์„ ์ œ์‹œํ•œ๋‹ค. 9.3k๊ฐœ์˜ Observation-Thought-Action ๊ถค์ ๊ณผ 3๋‹จ๊ณ„ ํ•™์Šต ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด ๊ณต๊ฐ„ ์ดํ•ด, ์‹œ๊ฐ„ ์ถ”๋ก , ์ž๊ธฐ ๋ฐ˜์„ฑ ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ˜ ๋ชจ๋ธ์„ ๊ฐœ๋ฐœํ–ˆ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2. Embodied-Reasoner exhibits spontaneous thinking behaviors, e.g., analyzing environmental states (#1,3), reflec

How

Figure 3

Figure 3. Left: Data Engine for synthesis. First, we synthesize instructions from

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ด ๋…ผ๋ฌธ์€ ์‹ฌ์ธต ์ถ”๋ก  ๋ชจ๋ธ์„ embodied AI ์˜์—ญ์œผ๋กœ ์ฒ˜์Œ ์ฒด๊ณ„์ ์œผ๋กœ ํ™•์žฅํ•˜์—ฌ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ ๊ณต๋ฐฑ์„ ์ฑ„์› ์œผ๋ฉฐ, ์‹คํ—˜ ๊ฒฐ๊ณผ ๋ช…ํ™•ํ•œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค๋งŒ ๋ฐ์ดํ„ฐ์…‹ ๊ทœ๋ชจ์™€ ํ‰๊ฐ€ ๋ฒ”์œ„ ํ™•๋Œ€, ์‹ค์ œ ํ™˜๊ฒฝ์—์„œ์˜ ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ–ฅํ›„ ์—ฐ๊ตฌ์—์„œ ํ•„์š”ํ•˜๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •