SCDP: Learning Humanoid Locomotion from Partial Observations via Mixed-Observation Distillation

์ €์ž: Milo Carroll, Tianhu Peng, Lingfan Bao, Chengxu Zhou, Zhibin Li | ๋‚ ์งœ: 2026-03-10 | URL: https://arxiv.org/abs/2603.09574 📄 PDF


Essence

Figure 2

Fig. 2: Sensor-Conditioned Diffusion Policies (SCDP) architecture and training framework. The state-action diffusion

์˜จ๋ณด๋“œ ์„ผ์„œ๋งŒ์œผ๋กœ ํœด๋จธ๋…ธ์ด๋“œ ๋ณดํ–‰์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด mixed-observation distillation์„ ์‚ฌ์šฉํ•˜๋Š” SCDP(Sensor-Conditioned Diffusion Policies)๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, diffusion model์ด ์„ผ์„œ ์ด๋ ฅ์— ์กฐ๊ฑดํ™”๋˜๋ฉด์„œ privileged ๋ฏธ๋ž˜ ์ƒํƒœ-ํ–‰๋™ ๊ถค์ ์„ ์˜ˆ์ธกํ•˜๋„๋ก ํ•™์Šตํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Fig. 1: Deployment of Sensor-Conditioned Diffusion Policies

How

Figure 2

Fig. 2: Sensor-Conditioned Diffusion Policies (SCDP) architecture and training framework. The state-action diffusion

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: Mixed-observation distillation์€ ๊ฐœ๋…์ ์œผ๋กœ ์šฐ์ˆ˜ํ•œ ํ•ด๊ฒฐ์ฑ…์ด๋ฉฐ, ์‹ค๋กœ๋ด‡ ๋ฐฐํฌ๊นŒ์ง€ ๋‹ฌ์„ฑํ•œ ์ ์ด ๋†’๊ฒŒ ํ‰๊ฐ€๋œ๋‹ค. ๋‹ค๋งŒ ์ผ๋ฐ˜ํ™” ๋ฒ”์œ„์™€ ์„ผ์„œ robustness ์ธก๋ฉด์˜ ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋ฉฐ, IROS ์ฑ„ํƒ์œผ๋กœ ์ธ์ •๋œ ๊ฒฌ๊ณ ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •