Vision-Language-Action Models for Robotics: A Review Towards Real-World Applications

์ €์ž: Kento Kawaharazuka, Jihoon Oh, Jun Yamada, Ingmar Posner, Yuke Zhu | ๋‚ ์งœ: 2025.10 | DOI: N/A 📄 PDF


Essence

Figure 1

FIGURE 1. Structure of this survey. Section II outlines the key challenges in developing Vision-Language-Action (VLA) mo

Vision-Language-Action (VLA) ๋ชจ๋ธ์€ ์‹œ๊ฐ, ์–ธ์–ด, ํ–‰๋™ ๋ฐ์ดํ„ฐ๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•œ ์ž‘์—…, ๊ฐ์ฒด, ๊ตฌํ˜„, ํ™˜๊ฒฝ์— ๊ฑธ์ณ ์ผ๋ฐ˜ํ™”ํ•  ์ˆ˜ ์žˆ๋Š” ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ๊ธฐ์ˆ ์ด๋‹ค. ์ด ์„œ๋ฒ ์ด๋Š” VLA์˜ ์•„ํ‚คํ…์ฒ˜, ํ•™์Šต ํŒจ๋Ÿฌ๋‹ค์ž„, ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘, ์‹ค์ œ ๋ฐฐํฌ๊นŒ์ง€ ํฌ๊ด„์ ์ธ ํ’€์Šคํƒ ๋ฆฌ๋ทฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

Motivation

Achievement

Figure 1

FIGURE 1. Structure of this survey. Section II outlines the key challenges in developing Vision-Language-Action (VLA) mo

How

Figure 3

FIGURE 3. Structure of Section IV and Section V. The figure summarizes key components of VLA models. The center illustra

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 5/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ด ์„œ๋ฒ ์ด๋Š” VLA ๋ถ„์•ผ์˜ ์ฒซ ์ข…ํ•ฉ์  ํ’€์Šคํƒ ๋ฆฌ๋ทฐ๋กœ์„œ, ์‹ค์ œ ๋กœ๋ด‡ ๋ฐฐํฌ์— ํ•„์š”ํ•œ ๋ชจ๋“  ์ธก๋ฉด์„ ๋‹ค๋ฃจ๋Š” ํฌ๊ด„์  ๊ฐ€์ด๋“œ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๋น ๋ฅด๊ฒŒ ๋ฐœ์ „ํ•˜๋Š” ๋ถ„์•ผ์˜ ํ˜„ํ™ฉ์„ ์ •๋ฆฌํ•˜๊ณ  ์‹ค๋ฌด์ž๋ฅผ ์œ„ํ•œ ์‹ค์งˆ์  ๊ถŒ์žฅ์‚ฌํ•ญ์„ ์ œ์‹œํ•˜์—ฌ ๋กœ๋ด‡๊ณตํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ƒ๋‹นํ•œ ๊ฐ€์น˜๋ฅผ ์ œ๊ณตํ•  ๊ฒƒ์œผ๋กœ ์˜ˆ์ƒ๋œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •