RationalVLA: A Rational Vision-Language-Action Model with Dual System

์ €์ž: Wenxuan Song, Jiayi Chen, Wenxue Li, Xu He, Han Zhao, Can Cui, Pengxiang Ding Shiyan Su, Feilong Tang, Xuelian Cheng, Donglin Wang, Zongyuan Ge, Xinhu Zheng, Zhe Liu, Hesheng Wang, Haoang Li | ๋‚ ์งœ: 2025-06-12 | URL: https://arxiv.org/abs/2506.10826 📄 PDF


Essence

Figure 1

Fig. 1.

๋กœ๋ด‡์ด ์‹คํ–‰ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ง€์‹œ๋ฅผ ๊ฑฐ๋ถ€ํ•  ์ˆ˜ ์žˆ๋Š” ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ˜ RationalVLA ๋ชจ๋ธ์„ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋ฅผ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด 6๊ฐ€์ง€ ์ฐจ์›์˜ ๊ฒฐํ•จ ์žˆ๋Š” ์ง€์‹œ๋ฅผ ํฌํ•จํ•œ RAMA ๋ฒค์น˜๋งˆํฌ๋ฅผ ๋„์ž…ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Fig. 1.

How

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: RationalVLA๋Š” ์‹ค์ œ ๋กœ๋ด‡ ๋ฐฐํฌ์—์„œ ์ค‘์š”ํ•˜์ง€๋งŒ ๊ทธ๋™์•ˆ ๊ฐ„๊ณผ๋˜์—ˆ๋˜ defective instruction ์ฒ˜๋ฆฌ ๋Šฅ๋ ฅ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ํ˜์‹ ์ ์ธ ์ž‘์—…์ด๋ฉฐ, RAMA ๋ฒค์น˜๋งˆํฌ์™€ dual-system ์•„ํ‚คํ…์ฒ˜์˜ ์กฐํ•ฉ์œผ๋กœ ์–ธ์–ด ์ดํ•ด์™€ ์กฐ์ž‘ ๋Šฅ๋ ฅ์„ ํšจ๊ณผ์ ์œผ๋กœ ํ†ตํ•ฉํ•œ ์šฐ์ˆ˜ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •