Opening the Sim-to-Real Door for Humanoid Pixel-to-Action Policy Transfer

์ €์ž: Haoru Xue, Tairan He, Zi Wang, Qingwei Ben, Wenli Xiao, Zhengyi Luo, Xingye Da, Fernando Castaรฑeda, Guanya Shi, Shankar Sastry, Linxi "Jim" Fan, Yuke Zhu | ๋‚ ์งœ: 2025-11-30 | DOI: 10.48550/arXiv.2512.01061 📄 PDF


Essence

Figure 2

Figure 2: DoorMan training pipeline. All phases are done interactively with IsaacLab. In Phase 1, we train a

GPU ๊ฐ€์† ํฌํ† ๋ฆฌ์–ผ๋ฆฌ์Šคํ‹ฑ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ teacher-student-bootstrap ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ์ˆœ์ˆ˜ RGB ์‹œ๊ฐ๋งŒ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๊ฐ„ํ˜• ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•œ ๋ฌธ์„ ์—ด ์ˆ˜ ์žˆ๋Š” sim-to-real ์ •์ฑ…์„ ๊ฐœ๋ฐœํ–ˆ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: DoorMan, a simulation-trained, RGB-only humanoid loco-manipulation policy, opens diverse, real-world doors.

How

Figure 3

Figure 3: Overview of the staged-reset exploration scheme. When entering a new stage, a snapshot of the

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ˆœ์ˆ˜ RGB ์‹œ๊ฐ๋งŒ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์‹ค์ œ ๋ฌธ์„ ์—ฌ๋Š” ์ธ๊ฐ„ํ˜• ๋กœ๋ด‡ ์ •์ฑ…์„ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ๋งŒ ํ›ˆ๋ จํ•˜์—ฌ ์˜์  ์ƒท ์ „์ด์— ์„ฑ๊ณตํ•œ ํš๊ธฐ์ ์ธ ์—ฐ๊ตฌ๋กœ, staged-reset ํƒ์ƒ‰๊ณผ GRPO ๊ธฐ๋ฐ˜ bootstrapping ๋“ฑ์˜ novel ๋ฐฉ๋ฒ•๋ก ์ด ์‹ค์งˆ์  ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์ž…์ฆํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •