MetaWorld-X: Hierarchical World Modeling via VLM-Orchestrated Experts for Humanoid Loco-Manipulation

์ €์ž: Yutong Shen, Hangxu Liu, Penghui Liu, Jiashuo Luo, Yongkang Zhang, Rex Morvley, Chen Jiang, Jianwei Zhang, Lei Zhang | ๋‚ ์งœ: 2026-03-09 | URL: https://arxiv.org/abs/2603.08572 📄 PDF


Essence

Figure 2

Fig. 2: MetaWorld-X achieves natural humanoid control through the dynamic orchestration of expert policies guided by a

ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ๋ณต์žกํ•œ ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜ ์ œ์–ด๋ฅผ Specialized Expert Policy(SEP)์™€ VLM ๊ธฐ๋ฐ˜ Intelligent Routing Mechanism(IRM)์œผ๋กœ ๋ถ„ํ•ด-ํ†ตํ•ฉํ•˜๋Š” ๊ณ„์ธต์  ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ์ธ๊ฐ„ ๋ชจ์…˜ ํ”„๋ผ์ด์–ด์™€ ์˜๋ฏธ์  ๋ผ์šฐํŒ…์„ ๊ฒฐํ•ฉํ•˜์—ฌ ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์•ˆ์ •์ ์ธ ๋™์ž‘์„ ์ƒ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Fig. 2: MetaWorld-X achieves natural humanoid control through the dynamic orchestration of expert policies guided by a

How

Figure 2

Fig. 2: MetaWorld-X achieves natural humanoid control through the dynamic orchestration of expert policies guided by a

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: MetaWorld-X๋Š” human motion priors, world models, VLM ๊ธฐ๋ฐ˜ ์˜๋ฏธ์  ๋ผ์šฐํŒ…์„ ์ฐฝ์˜์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ณ ์ž์œ ๋„ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ์ฝ”-๋งค๋‹ˆํ“ฐ๋ ˆ์ด์…˜ ์ œ์–ด์˜ ์ค‘์š”ํ•œ ๋ฌธ์ œ(์Šคํ‚ฌ ๊ฐ„์„ญ, ๋ถ€์ž์—ฐ์Šค๋Ÿฌ์šด ๋™์ž‘, ๋‚ฎ์€ ์ผ๋ฐ˜ํ™”)๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•œ๋‹ค. Humanoid-bench์—์„œ์˜ ๊ฐ•๋ ฅํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ์™€ ๋ช…ํ™•ํ•œ ๋ฐฉ๋ฒ•๋ก  ์ œ์‹œ์—๋„ ๋ถˆ๊ตฌํ•˜๊ณ , ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ ๋ถ€์žฌ๊ฐ€ ์ž„ํŒฉํŠธ๋ฅผ ์ œํ•œํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •