Robust Humanoid Walking on Compliant and Uneven Terrain with Deep Reinforcement Learning

์ €์ž: Rohan P. Singh, Mitsuharu Morisawa, Mehdi Benallegue, Zhaoming Xie, Fumio Kanehiro | ๋‚ ์งœ: 2025-04-18 | URL: https://arxiv.org/abs/2504.13619 📄 PDF


Essence

Figure 1

Fig. 1: HRP-5P humanoid bipedal locomotion (clockwise) on flat rigid

Deep RL์„ ์ด์šฉํ•˜์—ฌ humanoid robot HRP-5P๊ฐ€ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ terrain randomization์œผ๋กœ ํ•™์Šตํ•œ ์ •์ฑ…์„ ์‹ค์ œ ํ™˜๊ฒฝ์˜ compliantํ•˜๊ณ  unevenํ•œ terrain์—์„œ๋„ robustํ•˜๊ฒŒ ๋ณดํ–‰ํ•˜๋„๋ก ํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.

Motivation

Achievement

Figure 1

Fig. 1: HRP-5P humanoid bipedal locomotion (clockwise) on flat rigid

How

Figure 2

Fig. 2: Overview of our training framework. (L) We propose to train a feedforward RL agent while exposing it to randomiz

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: Life-sized humanoid์˜ challenging terrain ๋ณดํ–‰์„ ์œ„ํ•œ deep RL ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์˜ ์‹ค์ œ ๊ตฌํ˜„์„ ์„ฑ๊ณต์ ์œผ๋กœ ์ž…์ฆํ–ˆ์œผ๋ฉฐ, sim-to-real transfer์™€ adaptive gait control์˜ ํšจ๊ณผ๋ฅผ ๋ช…ํ™•ํžˆ ๋ณด์—ฌ์ค€ ์˜๋ฏธ ์žˆ๋Š” ์—ฐ๊ตฌ์ด๋‹ค. ๋‹ค๋งŒ clock control ์ •์ฑ…์˜ ์‹ค์ œ ์ ์šฉ ํšจ๊ณผ ๊ฒ€์ฆ๊ณผ failure case ๋ถ„์„์ด ๋ณด๊ฐ•๋˜๋ฉด ๋”์šฑ ์™„์„ฑ๋„ ๋†’์€ ์ž‘์—…์ด ๋  ์ˆ˜ ์žˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •