InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

์ €์ž: Junhao Cai, Zetao Cai, Jiafei Cao, Yilun Chen, Zeyu He, Lei Jiang, Hang Li, Hengjie Li, Yang Li, Yufei Liu, Yanan Lu, Qi Lv, Haoxiang Ma, Jiangmiao Pang, Yu Qiao, Zherui Qiu, Yanqing Shen, Xu Shi, Yang Tian, Bolun Wang, Hanqing Wang, Jiaheng Wang, Tai Wang, Xueyuan Wei, Chao Wu, Yiman Xie, Boyang Xing, Yuqiang Yang, Yuyin Yang, Qiaojun Yu, Feng Yuan, Jia Zeng, Jingjing Zhang, Shenghan Zhang, Shi Zhang, Zhuoma Zhaxi, Bowen Zhou, Yuanzhen Zhou, Yunsong Zhou, Hongrui Zhu, Yangkun Zhu, Yuchen Zhu | ๋‚ ์งœ: 2026-01-05 | URL: https://arxiv.org/abs/2601.02456 📄 PDF


Essence

Figure 1

Figure 1. InternVLA-A1 unifies scene understanding, visual foresight generation, and action execution

InternVLA-A1์€ Mixture-of-Transformers ์•„ํ‚คํ…์ฒ˜๋ฅผ ํ†ตํ•ด ์˜๋ฏธ ์ดํ•ด, ์‹œ๊ฐ์  ์˜ˆ์ธก, ํ–‰๋™ ์‹คํ–‰์„ ํ†ตํ•ฉํ•˜์—ฌ ๋กœ๋ด‡ ์กฐ์ž‘ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” Vision-Language-Action ๋ชจ๋ธ์ด๋‹ค. ์‹ค์„ธ๊ณ„ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ, ํ•ฉ์„ฑ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ์ดํ„ฐ, ์ธ๊ฐ„ ๋น„๋””์˜ค๋ฅผ ํฌํ•จํ•œ 692M ํ”„๋ ˆ์ž„์˜ ์ด์งˆ์  ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ํ•™์Šต๋˜์–ด ๋™์  ์กฐ์ž‘ ์ž‘์—…์—์„œ 26.7% ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. InternVLA-A1 unifies scene understanding, visual foresight generation, and action execution

How

Figure 2

Figure 2. Framework of InternVLA-A1. The architecture comprises three experts: (1) an under-

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: InternVLA-A1์€ ์˜๋ฏธ ์ดํ•ด์™€ ๋™์  ์˜ˆ์ธก์„ ํ†ตํ•ฉํ•˜๋Š” ํ˜์‹ ์  ์•„ํ‚คํ…์ฒ˜์™€ ์ด์งˆ์  ๋ฐ์ดํ„ฐ source์˜ ํšจ๊ณผ์  ํ™œ์šฉ์œผ๋กœ ๋กœ๋ด‡ ์กฐ์ž‘์˜ ์ผ๋ฐ˜ํ™” ๋ฌธ์ œ๋ฅผ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œ์ผฐ๋‹ค. ํŠนํžˆ ๋™์  ํ™˜๊ฒฝ์—์„œ์˜ 26.7% ์„ฑ๋Šฅ ํ–ฅ์ƒ์€ ์‹ค์„ธ๊ณ„ ์‘์šฉ์˜ ์ค‘์š”ํ•œ ์ง„์ „์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, VLA ๋ถ„์•ผ์˜ ์ฃผ์š” ๊ธฐ์—ฌ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •