์ ์: Shaofeng Yin, Yanjie Ze, Hong-Xing Yu, C. Karen Liu, Jiajun Wu | ๋ ์ง: 2025-11-13 | DOI: 10.48550/arXiv.2509.20322 📄 PDF
Fig. 2: VisualMimic consists of two training stages: 1) training a general keypoint tracker, where a teacher motion trac
VisualMimic์ egocentric vision๊ณผ hierarchical whole-body control์ ๊ฒฐํฉํ sim-to-real ํ๋ ์์ํฌ๋ก, ์ธ๊ฐ์ ๋์ ๋ฐ์ดํฐ๋ก ํ์ตํ task-agnostic keypoint tracker์ task-specific visuomotor policy๋ฅผ ํตํด humanoid robot์ loco-manipulation์ ์คํํ๋ค.
Fig. 3: Our visuomotor policies generalize across diverse space and time, shown on the box-pushing task.
Fig. 2: VisualMimic consists of two training stages: 1) training a general keypoint tracker, where a teacher motion trac
์ดํ: VisualMimic์ teacher-student distillation์ ์ฐฝ์์ ์ด์ค ์ ์ฉ๊ณผ human motion statistics ๊ธฐ๋ฐ ์ ์ฝ์ผ๋ก humanoid loco-manipulation์ ํ์ค์ ๊ณผ์ ๋ฅผ ํจ๊ณผ์ ์ผ๋ก ํด๊ฒฐํ๋ฉฐ, ๋ค์ํ ์์ ์์ zero-shot real-world transfer๋ฅผ ์ ์ฆํ ๋งค์ฐ ์๋ฏธ ์๋ ์ฐ๊ตฌ์ด๋ค.