Reinforcement Learning for Versatile, Dynamic, and Robust Bipedal Locomotion Control

์ €์ž: Zhongyu Li, Xue Bin Peng, Pieter Abbeel, Sergey Levine, Glen Berseth, Koushil Sreenath | ๋‚ ์งœ: 2024-01-30 | URL: https://arxiv.org/abs/2401.16889 📄 PDF


Essence

์ด์กฑ ๋กœ๋ด‡์˜ ๋‹ค์–‘ํ•œ ๋™์  ๋ณดํ–‰ ๊ธฐ์ˆ (๊ฑท๊ธฐ, ๋›ฐ๊ธฐ, ์ ํ”„)์„ ํ†ตํ•ฉ์ ์œผ๋กœ ์ œ์–ดํ•˜๊ธฐ ์œ„ํ•ด dual-history ์•„ํ‚คํ…์ฒ˜๋ฅผ ๊ฐ–์ถ˜ ์‹ฌํ™”๊ฐ•ํ™”ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜๊ณ , ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ์‹ค์ œ ๋กœ๋ด‡(Cassie)์œผ๋กœ ๋ฌดํŠœ๋‹ ์ „์ด ๋ฐฐํฌ๋ฅผ ์„ฑ๊ณต์‹œ์ผฐ๋‹ค.

Motivation

Achievement

How

Figure 3

Fig. 3: The proposed RL-based controller architecture that leverages

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ด์กฑ ๋กœ๋ด‡ ์ œ์–ด๋ผ๋Š” ๋„์ „์  ๊ณผ์ œ์—์„œ dual-history ์•„ํ‚คํ…์ฒ˜์™€ task randomization์„ ํ†ตํ•ด ํ†ตํ•ฉ RL ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋‹ฌ์„ฑํ•˜๊ณ , ๊ด‘๋ฒ”์œ„ํ•œ ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜์œผ๋กœ ๋‹ค์–‘ํ•œ ๋™์  ๋ณดํ–‰ ๊ธฐ์ˆ ์˜ ๊ฐ•๊ฑดํ•œ ๊ตฌํ˜„์„ ์ž…์ฆํ•œ ์šฐ์ˆ˜ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ๋‹ค๋งŒ ์•„ํ‚คํ…์ฒ˜ ์„ค๊ณ„ ์„ ํƒ์˜ ์ด๋ก ์  ๊ทผ๊ฑฐ ๊ฐ•ํ™”์™€ ๋‹ค๋ฅธ ํ”Œ๋žซํผ์œผ๋กœ์˜ ํ™•์žฅ์„ฑ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •