A Pragmatic VLA Foundation Model

์ €์ž: Wei Wu, Fan Lu, Yunnan Wang, Shuai Yang, Shi Liu, Fangjing Wang, Qian Zhu, He Sun, Yong Wang, Shuailei Ma, Yiyu Ren, Kejia Zhang, Hui Yu, Jingmei Zhao, Shuai Zhou, Zhenqi Qiu, Houlong Xiong, Ziyu Wang, Zechen Wang, Ran Cheng, Yong-Lu Li, Yongtao Huang, Xing Zhu, Yujun Shen, Kecheng Zheng | ๋‚ ์งœ: 2026-01-26 | URL: https://arxiv.org/abs/2601.18692 📄 PDF


Essence

Figure 1

Figure 1. Overview of LingBot-VLA. We scale dual-arm robot data collected in the real world for pre-training. LingBot-VL

LingBot-VLA๋Š” ์•ฝ 20,000์‹œ๊ฐ„์˜ ์‹ค์ œ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ๋กœ ํ•™์Šตํ•œ Vision-Language-Action ๊ธฐ์ดˆ ๋ชจ๋ธ๋กœ, ํšจ์œจ์ ์ธ ํ•™์Šต๊ณผ ๋‹ค์ค‘ ํ”Œ๋žซํผ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐ–์ถ˜๋‹ค.

Motivation

Achievement

Figure 2

Figure 2. Visualization of pre-training dataset used by LingBot-VLA.

How

Figure 1

Figure 1. Overview of LingBot-VLA. We scale dual-arm robot data collected in the real world for pre-training. LingBot-VL

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: LingBot-VLA๋Š” ์‹ค์ œ ๋กœ๋ด‡ ํ•™์Šต์˜ ์Šค์ผ€์ผ๋ง ๊ฑฐ๋™์„ ์ตœ์ดˆ๋กœ ์‹ค์ฆํ•˜๊ณ  ๋Œ€๊ทœ๋ชจ ๋‹ค์–‘ํ•œ ๋ฐ์ดํ„ฐ์™€ ํšจ์œจ์  ํ›ˆ๋ จ ์ธํ”„๋ผ๋ฅผ ํ†ตํ•ด ์‹ค์šฉ์ ์ด๊ณ  ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ VLA ๊ธฐ์ดˆ ๋ชจ๋ธ์„ ์ œ์‹œํ•˜๋ฉฐ, ์˜คํ”ˆ ์†Œ์Šค ๊ณต๊ฐœ๋กœ ๋กœ๋ด‡ ํ•™์Šต ์ปค๋ฎค๋‹ˆํ‹ฐ์— ํ˜„์ €ํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •