HumanX: Toward Agile and Generalizable Humanoid Interaction Skills from Human Videos

์ €์ž: Yinhuai Wang, Qihan Zhao, Yuen Fui Lau, Runyi Yu, Hok Wai Tsui, Qifeng Chen, Jingbo Wang, Jiangmiao Pang, Ping Tan | ๋‚ ์งœ: 2026-02-02 | DOI: 10.48550/arXiv.2602.02473 📄 PDF


Essence

Figure 1

Fig. 1: HumanX enables diverse interaction skills through two core components. XGen synthesizes and augments humanoid in

HumanX๋Š” ์ธ๊ฐ„ ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ƒํ˜ธ์ž‘์šฉ ์Šคํ‚ฌ์„ ํ•™์Šตํ•˜๋Š” ์ „์ฒด ์Šคํƒ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, XGen ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ํŒŒ์ดํ”„๋ผ์ธ๊ณผ XMimic ๋ชจ๋ฐฉ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ์ปดํฌ๋„ŒํŠธ๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ ๊ณผ์ œ๋ณ„ ๋ณด์ƒ ์„ค๊ณ„ ์—†์ด ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅํ•œ ํ˜„์‹ค ์„ธ๊ณ„ ์Šคํ‚ฌ์„ ์Šต๋“ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Fig. 1: HumanX enables diverse interaction skills through two core components. XGen synthesizes and augments humanoid in

How

Figure 2

Fig. 2: Overview of XGen. The pipeline begins by estimating SMPL-based human motion from video and retargeting it to the

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: HumanX๋Š” ๋ฌผ๋ฆฌ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ํ•ฉ์„ฑ๊ณผ ์ผ๋ฐ˜ํ™” ์šฐ์„  ๋ชจ๋ฐฉ ํ•™์Šต์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹จ์ผ ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ํ˜„์‹ค ์„ธ๊ณ„ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ๋‹ค์–‘ํ•œ ์ƒํ˜ธ์ž‘์šฉ ์Šคํ‚ฌ์„ ํšจ์œจ์ ์œผ๋กœ ์Šต๋“ํ•˜๋Š” ํš๊ธฐ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜๋ฉฐ, 8๋ฐฐ ์ด์ƒ์˜ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ํ–ฅ์ƒ๊ณผ ์ ์‘ํ˜• ํ–‰๋™ ์‹œ์—ฐ์œผ๋กœ ๋กœ๋ณดํ‹ฑ์Šค ๋ถ„์•ผ์— ์ƒ๋‹นํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •