Parallels Between VLA Model Post-Training and Human Motor Learning: Progress, Challenges, and Trends

์ €์ž: Tian-Yu Xiang, Ao-Qun Jin, Xiao-Hu Zhou, Mei-Jiang Gui, Xiao-Liang Xie, Shi-Qi Liu, Shuang-Yi Wang, Sheng-Bin Duan, Fu-Chao Xie, Wen-Kai Wang, Si-Cheng Wang, Ling-Yun Li, Tian Tu, Zeng-Guang Hou | ๋‚ ์งœ: 2025-06-26 | URL: https://arxiv.org/abs/2506.20966 📄 PDF


Essence

Figure 1

Fig. 1.

๋ณธ ๋…ผ๋ฌธ์€ Vision-Language-Action (VLA) ๋ชจ๋ธ์˜ post-training ๋ฐฉ๋ฒ•์„ ์ธ๊ฐ„์˜ ์šด๋™ ํ•™์Šต ์ด๋ก (Newell์˜ ์ œ์•ฝ ์ฃผ๋„ ์ด๋ก )์˜ ๊ด€์ ์—์„œ ์ข…ํ•ฉ์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ณ , ํ™˜๊ฒฝ ์ง€๊ฐ, ์‹ ์ฒด ์ธ์‹, ์ž‘์—… ์ดํ•ด, ๋‹ค์ค‘ ์š”์†Œ ํ†ตํ•ฉ์˜ 4๊ฐ€์ง€ ๋ฒ”์ฃผ๋กœ ์ฒด๊ณ„ํ™”ํ•œ ์„ค๋ฌธ ๋…ผ๋ฌธ์ด๋‹ค.

Motivation

Achievement

Figure 4

Fig. 4. Taxonomy of post-training VLA models proposed in this study.

How

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ VLA model post-training์„ ์ธ๊ฐ„์˜ ์šด๋™ ํ•™์Šต ์ด๋ก ์œผ๋กœ ํ†ตํ•ฉ ๋ถ„์„ํ•œ ์ฐฝ์˜์ ์ธ ์„ค๋ฌธ ๋…ผ๋ฌธ์œผ๋กœ, NeuroAI ํŒจ๋Ÿฌ๋‹ค์ž„์˜ ์ค‘์š”์„ฑ์„ ๊ฐ•์กฐํ•˜๋ฉฐ ๋กœ๋ด‡๊ณตํ•™ ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋ช…ํ™•ํ•œ ๊ฐ€์ด๋“œ๋ผ์ธ์„ ์ œ๊ณตํ•œ๋‹ค. ๋‹ค๋งŒ ์ด๋ก ์  ํ”„๋ ˆ์ž„์›Œํฌ ์ œ์‹œ ์ค‘์‹ฌ์ด๋ฏ€๋กœ ๊ฐ ๋ฒ”์ฃผ์˜ ๊ตฌ์ฒด์  ๊ธฐ์ˆ  ๋ฐœ์ „๊ณผ ๋ฏธํ•ด๊ฒฐ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์‹ฌํ™” ๋ถ„์„์ด ์ถ”๊ฐ€๋˜๋ฉด ๋”์šฑ ์‹ค๋ฌด์  ๊ฐ€์น˜๊ฐ€ ๋†’์•„์งˆ ๊ฒƒ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •