Success in Humanoid Reinforcement Learning under Partial Observation

์ €์ž: Wuhao Wang, Zhiyong Chen | ๋‚ ์งœ: 2025-07-25 | URL: https://arxiv.org/abs/2507.18883 📄 PDF


Essence

Figure 1

Figure 1 summarizes the training performance under three partial observability configurations:

๋ถ€๋ถ„ ๊ด€์ฐฐ ํ™˜๊ฒฝ์—์„œ ๊ณ ์ • ๊ธธ์ด ๊ณผ๊ฑฐ ๊ด€์ฐฐ ์‹œํ€€์Šค๋ฅผ ๋ณ‘๋ ฌ๋กœ ์ฒ˜๋ฆฌํ•˜๋Š” novel history encoder๋ฅผ ์ œ์•ˆํ•˜์—ฌ, Gymnasium Humanoid-v4 ํ™˜๊ฒฝ์—์„œ ๋ถ€๋ถ„ ๊ด€์ฐฐ ํ•˜์—์„œ์˜ ์•ˆ์ •์ ์ธ humanoid ์ •์ฑ… ํ•™์Šต์„ ์ฒ˜์Œ์œผ๋กœ ์„ฑ๊ณต์‹œ์ผฐ๋‹ค.

Motivation

Achievement

How

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 3/5 Overall: 4/5

์ดํ‰: ๋ณธ ์—ฐ๊ตฌ๋Š” ๋ถ€๋ถ„ ๊ด€์ฐฐ ํ™˜๊ฒฝ์—์„œ์˜ ๊ณ ์ฐจ์› humanoid ์ œ์–ด๋ผ๋Š” ๋ฏธํ•ด๊ฒฐ ๋ฌธ์ œ๋ฅผ ์ฒ˜์Œ์œผ๋กœ ์„ฑ๊ณต์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ฉฐ, ๋ณ‘๋ ฌ history encoder๋ฅผ ํ†ตํ•ด ๊ธฐ์กด RNN ๊ธฐ๋ฐ˜ ๋ฉ”๋ชจ๋ฆฌ ๋ฐฉ๋ฒ•๋“ค์„ ์••๋„์ ์œผ๋กœ ๋Šฅ๊ฐ€ํ•œ๋‹ค. ๋‹ค๋งŒ ๋ฐฉ๋ฒ•๋ก ์˜ ๊ตฌ์ฒด์  ์„ค๋ช…์ด ๋ถ€์กฑํ•˜๊ณ  ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •