Commanding Humanoid by Free-form Language: A Large Language Action Model with Unified Motion Vocabulary

์ €์ž: Zhirui Liu, Kaiyang Ji, Ke Yang, Jingyi Yu, Ye Shi, Jingya Wang | ๋‚ ์งœ: 2026-04-10 | DOI: 10.48550/arXiv.2511.22963 📄 PDF


Essence

Figure 1

Figure 1. An illustration of Humanoid-LLA. Given a high-level

์ž์œ ํ˜•์‹ ์ž์—ฐ์–ธ์–ด ๋ช…๋ น์„ ์ธ๊ฐ„ํ˜• ๋กœ๋ด‡์˜ ์‹ ์ฒด ์ „์ฒด ์ œ์–ด๋กœ ๋งคํ•‘ํ•˜๋Š” Large Language Action Model(Humanoid-LLA)์„ ์ œ์•ˆํ•˜๋ฉฐ, ํ†ตํ•ฉ ๋ชจ์…˜ ์–ดํœ˜, ์–ดํœ˜-์ง€ํ–ฅ ์ปจํŠธ๋กค๋Ÿฌ ์ฆ๋ฅ˜, ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ํŒŒ์ธํŠœ๋‹์„ ํ†ตํ•ด ์–ธ์–ด ์ผ๋ฐ˜ํ™”์™€ ๋ฌผ๋ฆฌ์  ํƒ€๋‹น์„ฑ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 3

Figure 3. Real-world demonstrations on Unitree G1 and Booster T1. The tested prompts contain unseen terms (โ€soldierโ€, โ€m

How

Figure 2

Figure 2. An overview of Humanoid-LLA. In stage one, we build a unified motion vocabulary leveraging a large-scale paire

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: Humanoid-LLA๋Š” ํ†ตํ•ฉ ๋ชจ์…˜ ์–ดํœ˜, ์–ดํœ˜-์ง€ํ–ฅ ์ฆ๋ฅ˜, ๊ฐ•ํ™”ํ•™์Šต ํŒŒ์ธํŠœ๋‹์„ ํ†ตํ•ฉํ•˜์—ฌ ์ž์œ ํ˜•์‹ ์–ธ์–ด์—์„œ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ธ๊ฐ„ํ˜• ๋กœ๋ด‡ ์ œ์–ด๋กœ์˜ ๋งคํ•‘์„ ์ตœ์ดˆ๋กœ ๋‹ฌ์„ฑํ•œ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ์ด๋ฉฐ, ์‹ค์„ธ๊ณ„ ๊ฒ€์ฆ๊ณผ ๋ช…ํ™•ํ•œ ๊ธฐ์ˆ  ํ˜์‹ ์œผ๋กœ ์ธ๊ฐ„-๋กœ๋ด‡ ์ƒํ˜ธ์ž‘์šฉ ๋ถ„์•ผ์˜ ์ค‘๋Œ€ํ•œ ์ง„์ „์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •