From Language to Locomotion: Retargeting-free Humanoid Control via Motion Latent Guidance

์ €์ž: Zhe Li, Cheng Chi, Yangyang Wei, Boan Zhu, Yibo Peng, Tao Huang, Pengwei Wang, Zhongyuan Wang, Shanghang Zhang, Chang Xu | ๋‚ ์งœ: 2025-10-17 | DOI: 10.48550/arXiv.2510.14952 📄 PDF


Essence

Figure 2

Figure 2: Overview of RoboGhost. We propose a two-stage approach: a motion latent is first generated, then a

RoboGhost๋Š” ์–ธ์–ด ์ง€์‹œ๋ฅผ humanoid ๋กœ๋ด‡์˜ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ๋™์ž‘์œผ๋กœ ์ง์ ‘ ๋ณ€ํ™˜ํ•˜๋Š” retargeting-free ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, motion latent์„ ์กฐ๊ฑด์œผ๋กœ ํ•˜๋Š” diffusion-based policy๋ฅผ ํ†ตํ•ด ๊ธฐ์กด์˜ ๋‹ค๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋ˆ„์  ์˜ค๋ฅ˜์™€ ์ง€์—ฐ์„ ์ œ๊ฑฐํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1:

How

Figure 2

Figure 2: Overview of RoboGhost. We propose a two-stage approach: a motion latent is first generated, then a

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: RoboGhost๋Š” language-guided humanoid ์ œ์–ด์˜ ๊ทผ๋ณธ์ ์ธ ํŒŒ์ดํ”„๋ผ์ธ ์žฌ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ๊ธฐ์กด์˜ ๋‹ค๋‹จ๊ณ„ ์ ‘๊ทผ์˜ ํ•œ๊ณ„๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ฉฐ, ์‹ค์ œ ๋กœ๋ด‡ ๋ฐฐํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์ž…์ฆํ•œ ๋งค์šฐ ์˜ํ–ฅ๋ ฅ ์žˆ๋Š” ์—ฐ๊ตฌ์ด๋‹ค. ๋‹ค๋งŒ ํ•ด์„์„ฑ ๊ฐ•ํ™”์™€ ๋ณต์žกํ•œ task๋กœ์˜ ํ™•์žฅ์ด ํ›„์† ๊ณผ์ œ๋กœ ๋‚จ์•„์žˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •