EgoScale: Scaling Dexterous Manipulation with Diverse Egocentric Human Data

์ €์ž: Ruijie Zheng, Dantong Niu, Yuqi Xie, Jing Wang, Mengda Xu, Yunfan Jiang, Fernando Castaรฑeda, Fengyuan Hu, You Liang Tan, Letian Fu, Trevor Darrell, Furong Huang, Yuke Zhu, Danfei Xu, Linxi Fan | ๋‚ ์งœ: 2026-02-18 | URL: https://arxiv.org/abs/2602.16710 📄 PDF


Essence

Figure 1

Figure 1: EgoScale: Two-stage human-to-robot learning framework. A flow-based Vision-Language-Action

20,854์‹œ๊ฐ„์˜ ๋Œ€๊ทœ๋ชจ ์ด๊ณ ์„ผํŠธ๋ฆญ ์ธ๊ฐ„ ๋น„๋””์˜ค ๋ฐ์ดํ„ฐ๋กœ VLA ๋ชจ๋ธ์„ ์‚ฌ์ „ํ•™์Šตํ•œ ํ›„ ์†Œ๋Ÿ‰์˜ ์ •๋ ฌ๋œ ์ธ๊ฐ„-๋กœ๋ด‡ ์ค‘๊ฐ„ํ•™์Šต ๋ฐ์ดํ„ฐ๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•˜์—ฌ 22-DoF ์†๊ฐ€๋ฝ ์กฐ์ž‘ ๋กœ๋ด‡์—์„œ 54% ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4: Main Experimental Results. Comparison of Human Pre-train + Mid-Training, Human Pretraining,

How

Figure 2

Figure 2: Human Data Collection and Model Architecture. (Left) Aligned human-robot mid-training data

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์ด๊ณ ์„ผํŠธ๋ฆญ ์ธ๊ฐ„ ๋ฐ์ดํ„ฐ์˜ ์Šค์ผ€์ผ๋ง ๋ฒ•์น™์„ ์ตœ์ดˆ๋กœ ์ž…์ฆํ•˜๊ณ  ์ด๋ฅผ ๊ณ ์ž์œ ๋„ ์†๊ฐ€๋ฝ ์กฐ์ž‘์— ํšจ๊ณผ์ ์œผ๋กœ ์ ์šฉํ•œ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค. ๋ช…ํ™•ํ•œ ์‹คํ—˜ ์„ค๊ณ„์™€ ๊ฐ•๋ ฅํ•œ ์‹ค์ฆ ๊ฒฐ๊ณผ(54% ์„ฑ๊ณต๋ฅ  ํ–ฅ์ƒ, ์ผํšŒ์„ฑ ์ „์ด)๋Š” ์ธ๊ฐ„ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ํ•™์Šต์˜ ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ์„ ํ™•์‹คํžˆ ๋ณด์—ฌ์ฃผ์ง€๋งŒ, ํฌ์ฆˆ ์ถ”์ • ๋…ธ์ด์ฆˆ, ์ค‘๊ฐ„ํ•™์Šต ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋น„์šฉ, ํƒœ์Šคํฌ/ํ”Œ๋žซํผ ๋‹ค์–‘์„ฑ ์ œํ•œ์ด ์‹ค์ œ ๋ฐฐํฌ ํ™•๋Œ€๋ฅผ ์œ„ํ•ด ํ•ด๊ฒฐํ•ด์•ผ ํ•  ๊ณผ์ œ๋กœ ๋‚จ์•„์žˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •