Towards Adaptive Humanoid Control via Multi-Behavior Distillation and Reinforced Fine-Tuning

์ €์ž: Yingnan Zhao, Xinmiao Wang, Dewei Wang, Xinzhe Liu, Dan Lu, Qilong Han, Peng Liu, Chenjia Bai | ๋‚ ์งœ: 2025-11-11 | DOI: 10.48550/arXiv.2511.06371 📄 PDF


Essence

Figure 2

Figure 2: Overview of the proposed two-stage framework Adaptive Humanoid Control. In the first stage, we train two separ

ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์ด ๋‹ค์–‘ํ•œ ์ด์กฑ๋ณดํ–‰ ํ–‰๋™(์„œ๊ธฐ, ๊ฑท๊ธฐ, ๋›ฐ๊ธฐ, ์ ํ”„)์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋„๋ก ๋‹ค์ค‘ํ–‰๋™ ์ฆ๋ฅ˜(multi-behavior distillation)์™€ ๊ฐ•ํ™”ํ•™์Šต ๋ฏธ์„ธ์กฐ์ •์„ ํ†ตํ•ด ์ ์‘ํ˜• ์ œ์–ด๊ธฐ๋ฅผ ๊ฐœ๋ฐœํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: Comparison between multi-task RL and our pro-

How

Figure 2

Figure 2: Overview of the proposed two-stage framework Adaptive Humanoid Control. In the first stage, we train two separ

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋‹ค์ค‘ํ–‰๋™ ์ฆ๋ฅ˜์™€ ๊ฐ•ํ™”ํ•™์Šต ๋ฏธ์„ธ์กฐ์ •์„ ๊ฒฐํ•ฉํ•œ 2๋‹จ๊ณ„ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ ์‘ํ˜• ์ œ์–ด๋ผ๋Š” ์ค‘์š”ํ•œ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์‹ค์šฉ์ ์ด๊ณ  ํšจ๊ณผ์ ์ธ ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ•˜๋ฉฐ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค๋กœ๋ด‡ ์‹คํ—˜์„ ํ†ตํ•ด ๊ทธ ํƒ€๋‹น์„ฑ์„ ์ž…์ฆํ–ˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •