SimpleVLA-RL: Scaling VLA Training via Reinforcement Learning

์ €์ž: Haozhan Li, Yuxin Zuo, Jiale Yu, Yuhao Zhang, Zhaohui Yang, Kaiyan Zhang, Xuekai Zhu, Yuchen Zhang, Tianxing Chen, Ganqu Cui, Dehui Wang, Dingxiang Luo, Yuchen Fan, Youbang Sun, Jia Zeng, Jiangmiao Pang, Shanghang Zhang, Yu Wang, Yao Mu, Bowen Zhou, Ning Ding | ๋‚ ์งœ: 2025-09-11 | URL: https://arxiv.org/abs/2509.09674 📄 PDF


Essence

Figure 1

Figure 1 | Overview of SimpleVLA-RL. SimpleVLA-RL is an efficient RL framework for VLA that im-

SimpleVLA-RL์€ Vision-Language-Action ๋ชจ๋ธ์˜ ํ•™์Šต์„ ๊ฐ•ํ™”ํ•™์Šต(RL)์„ ํ†ตํ•ด ํ™•์žฅํ•˜๋Š” ํšจ์œจ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ๋ฐ์ดํ„ฐ ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์‹ค์ œ ๋กœ๋ด‡ ์ž‘์—…์—์„œ SFT๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1 | Overview of SimpleVLA-RL. SimpleVLA-RL is an efficient RL framework for VLA that im-

How

Figure 2

Figure 2 | Overview of SimpleVLA-RL.

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: SimpleVLA-RL์€ RL์„ VLA ํ•™์Šต์— ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ๋ถ€์กฑ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ณ  ์‹ค์ œ ๋กœ๋ด‡ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ์ด๋ฉฐ, "pushcut" ํ˜„์ƒ์˜ ๋ฐœ๊ฒฌ์€ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค. ๋‹ค๋งŒ ๊ณ„์‚ฐ ๋น„์šฉ๊ณผ ์‹ค์ œ ํ™˜๊ฒฝ ๊ฒ€์ฆ์˜ ํ™•๋Œ€๊ฐ€ ํ–ฅํ›„ ๊ณผ์ œ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •