Robot Learning in the Era of Foundation Models: A Survey

์ €์ž: Xuan Xiao, Jiahang Liu, Zhipeng Wang, Yanmin Zhou, Yong Qi, Qian Cheng, Bin He, Shuo Jiang | ๋‚ ์งœ: 2023-11-24 | URL: https://arxiv.org/abs/2311.14379 📄 PDF


Essence

Figure 1

Fig.1. Overall structure of the survey.

์ด ๋…ผ๋ฌธ์€ Large Language Models(LLMs)๊ณผ multimodal foundation models๋ฅผ ๋กœ๋ด‡ ํ•™์Šต์— ์ ์šฉํ•˜๋Š” ์ตœ์‹  ๊ธฐ์ˆ ์„ ์ฒด๊ณ„์ ์œผ๋กœ ์กฐ์‚ฌํ•˜๋Š” survey์ด๋ฉฐ, manipulation, navigation, planning, reasoning์˜ ๋„ค ๊ฐ€์ง€ ์ฃผ์š” ์˜์—ญ์—์„œ foundation model ๊ธฐ๋ฒ•์˜ ์ ์šฉ ๋ฐฉ์‹์„ ๋ถ„์„ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Fig.1. Overall structure of the survey.

How

Figure 2

Fig.2. Technical Evolution[27-30].

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ด ๋…ผ๋ฌธ์€ LLMs์™€ multimodal foundation models์˜ ๋กœ๋ด‡ ํ•™์Šต ์ ์šฉ์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํ•™์ œ๊ฐ„ ๋ถ„์•ผ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ •๋ฆฌํ•œ ์ค‘์š”ํ•œ survey๋กœ์„œ, ๊ธฐ์ˆ  ์ง„ํ™” ๋‹จ๊ณ„ํ™”, ๋„ค ๊ฐ€์ง€ ์ฃผ์š” ์ž‘์—… ์˜์—ญ ๋ถ„๋ฅ˜, ๊ทธ๋ฆฌ๊ณ  ๋ฏธํ•ด๊ฒฐ ์‹ค์ œ ๋ฌธ์ œ์˜ ๋ช…์‹œ์  ๊ทœ๋ช…์„ ํ†ตํ•ด ํ–ฅํ›„ embodied AI ์—ฐ๊ตฌ์˜ ๋กœ๋“œ๋งต์„ ์ œ์‹œํ•œ๋‹ค. ๋‹ค๋งŒ ๊ตฌ์ฒด์ ์ธ ๊ธฐ์ˆ ์  ํ•ด๋ฒ•๊ณผ ์ •๋Ÿ‰์  ์„ฑ๋Šฅ ๋น„๊ต๊ฐ€ ๋ถ€์กฑํ•˜์—ฌ ์‹ค์ œ ๊ตฌํ˜„ ๋‹จ๊ณ„์˜ ์—ฐ๊ตฌ์ž๋“ค์„ ์œ„ํ•œ ๊ฐ€์ด๋“œ๋กœ์„œ์˜ ์—ญํ• ์€ ์ œํ•œ์ ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •