Exploring Embodied Multimodal Large Models: Development, Datasets, and Future Directions

์ €์ž: Shoubin Chen, Zehao Wu, Kai Zhang, Chunyu Li, Baiyang Zhang, Fei Ma, Fei Richard Yu, Qingquan Li | ๋‚ ์งœ: 2025-02-21 | URL: https://arxiv.org/abs/2502.15336 📄 PDF


Essence

Figure 1

Figure 1: A timeline of research progress in the field of Embodied Perception, Navigation

Embodied Multimodal Large Models (EMLMs)๋Š” Large Language Models, Large Vision Models ๋“ฑ์˜ ๊ธฐ์ดˆ ๋ชจ๋ธ๋“ค์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ง€๊ฐ, ์ธ์ง€, ํ–‰๋™์„ ๋ฌผ๋ฆฌ์  ํ™˜๊ฒฝ์—์„œ ํ†ตํ•ฉํ•˜๋Š” ์ฒด๊ณ„์ ์ธ ์ข…ํ•ฉ ๋ฆฌ๋ทฐ์ด๋‹ค. ๋ณธ ๋…ผ๋ฌธ์€ 300๊ฐœ ๋…ผ๋ฌธ์„ ๋ถ„์„ํ•˜์—ฌ EMLMs์˜ ๋ฐœ์ „, ๋ฐ์ดํ„ฐ์…‹, ๋ฐ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ์— ๋Œ€ํ•œ ์ฒซ ๋ฒˆ์งธ ์ฒด๊ณ„์  ๋ถ„์„์„ ์ œ๊ณตํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: A timeline of research progress in the field of Embodied Perception, Navigation

How

Figure 1

Figure 1: A timeline of research progress in the field of Embodied Perception, Navigation

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋ฆฌ๋ทฐ๋Š” EMLMs ๋ถ„์•ผ์˜ ์ฒซ ๋ฒˆ์งธ ์ฒด๊ณ„์  ์ข…ํ•ฉ ๋ถ„์„์œผ๋กœ์„œ, foundational models๋ถ€ํ„ฐ embodied tasks๊นŒ์ง€ full-stack์„ ๋‹ค๋ฃจ๋ฉฐ ์ตœ์‹  ์—ฐ๊ตฌ ๋™ํ–ฅ์„ ํฌ๊ด„์ ์œผ๋กœ ์ •๋ฆฌํ–ˆ๋‹ค. ๋ช…ํ™•ํ•œ ๊ตฌ์กฐ์™€ ํ’๋ถ€ํ•œ ์‚ฌ๋ก€๋กœ ์ด ๊ธ‰์†ํžˆ ๋ฐœ์ „ํ•˜๋Š” ๋ถ„์•ผ์˜ ํ˜„ํ™ฉ๊ณผ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•˜๋Š” ๋งค์šฐ ๊ฐ€์น˜ ์žˆ๋Š” ๋ฆฌ๋ทฐ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •