HoRD: Robust Humanoid Control via History-Conditioned Reinforcement Learning and Online Distillation

์ €์ž: Puyue Wang, Jiawei Hu, Yan Gao, Junyan Wang, Yu Zhang, Gillian Dobbie, Tao Gu, Wafa Johal, Ting Dang, Hong Jia | ๋‚ ์งœ: 2026-02-04 | URL: https://arxiv.org/abs/2602.04412 📄 PDF


Essence

Figure 1

Figure 1. Framework overview. Two-stage teacherโ€“student learning pipeline for robust humanoid control under partial obse

HoRD๋Š” history-conditioned reinforcement learning๊ณผ online distillation์„ ๊ฒฐํ•ฉํ•œ ๋‘ ๋‹จ๊ณ„ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์ด ๋„๋ฉ”์ธ ์‹œํ”„ํŠธ ์ƒํ™ฉ์—์„œ ๊ฐ•๊ฑดํ•œ ์ œ์–ด๋ฅผ ์ˆ˜ํ–‰ํ•˜๋„๋ก ํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2. Results of HoRD on six representative motions, while red markers indicate ground-truth skeleton joints. Qualit

How

Figure 1

Figure 1. Framework overview. Two-stage teacherโ€“student learning pipeline for robust humanoid control under partial obse

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: HoRD๋Š” history-conditioned ๋™์—ญํ•™ ์ถ”๋ก ๊ณผ sparse ๋ช…๋ น ์ฒ˜๋ฆฌ๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ํ•ต์‹ฌ ํ˜์‹ ์„ ํ†ตํ•ด ํœด๋จธ๋…ธ์ด๋“œ ์ œ์–ด์˜ ๊ฐ•๊ฑด์„ฑ๊ณผ ์ผ๋ฐ˜ํ™” ๋ฌธ์ œ๋ฅผ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ฉฐ, ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜ ๊ฒ€์ฆ๊ณผ ๋ฐ์ดํ„ฐ์…‹ ๊ณต๊ฐœ๋กœ ์‹ค์šฉ์  ๊ฐ€์น˜๋ฅผ ์ž…์ฆํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •