FRoM-W1: Towards General Humanoid Whole-Body Control with Language Instructions

์ €์ž: Peng Li, Zihan Zhuang, Yangfan Gao, Yi Dong, Sixian Li, Changhao Jiang, Shihan Dou, Zhiheng Xi, Enyu Zhou, Jixuan Huang, Hui Li, Jingjing Gong, Xingjun Ma, Tao Gui, Zuxuan Wu, Qi Zhang, Xuanjing Huang, Yu-Gang Jiang, Xipeng Qiu | ๋‚ ์งœ: 2026-01-19 | DOI: 10.48550/arXiv.2601.12799 📄 PDF


Essence

Figure 2

Figure 2 | The inference pipeline of FRoM-W1. (a) H-GPT first translates language instructions

FRoM-W1์€ ์ž์—ฐ์–ด ์ง€์‹œ๋ฌธ์œผ๋กœ๋ถ€ํ„ฐ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ „์‹  ์›€์ง์ž„์„ ์ œ์–ดํ•˜๋Š” ์˜คํ”ˆ์†Œ์Šค ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, H-GPT ๋ชจ๋ธ๊ณผ H-ACT ๋ชจ๋“ˆ์˜ 2๋‹จ๊ณ„ ๊ตฌ์กฐ๋กœ ์–ธ์–ด ์ดํ•ด์™€ ์•ˆ์ •์ ์ธ ๋กœ๋ด‡ ์‹คํ–‰์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1 | (a) We introduce FRoM-W1, an open-source framework that leverages Chain-of-Thought

How

Figure 2

Figure 2 | The inference pipeline of FRoM-W1. (a) H-GPT first translates language instructions

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: FRoM-W1์€ ์ž์—ฐ์–ด ๊ธฐ๋ฐ˜ ํœด๋จธ๋…ธ์ด๋“œ ์ „์‹  ์ œ์–ด๋ผ๋Š” ์ค‘์š”ํ•œ ๋ฌธ์ œ๋ฅผ Chain-of-Thought์™€ 2๋‹จ๊ณ„ RL ์ „๋žต์œผ๋กœ ์ฐฝ์˜์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ฉฐ, ์™„์ „ ์˜คํ”ˆ์†Œ์Šค ์ œ๊ณต๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ์‹ค์ฆ์„ ํ†ตํ•ด ๋†’์€ ์‹ค์šฉ์„ฑ๊ณผ ์žฌํ˜„์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •