Benchmarking Potential Based Rewards for Learning Humanoid Locomotion

์ €์ž: Se Hwan Jeon, Steve Heim, Charles Khazoom, Sangbae Kim | ๋‚ ์งœ: 2023-07-19 | URL: https://arxiv.org/abs/2307.10142 📄 PDF


Essence

Figure 2

Fig. 2: A visualization of a tracking reward in both direct-

๋ณธ ๋…ผ๋ฌธ์€ humanoid ๋กœ๋ด‡์˜ ๊ณ ์ฐจ์› ๋ณดํ–‰ ํ•™์Šต์—์„œ potential-based reward shaping (PBRS)๊ณผ direct reward shaping (DRS)์„ ๋ฒค์น˜๋งˆํฌํ•˜์—ฌ, PBRS๊ฐ€ ์ˆ˜๋ ด ์†๋„์—์„œ๋Š” ํ•œ๊ณ„์  ์ด์ ๋งŒ ์ œ๊ณตํ•˜์ง€๋งŒ ๋ณด์ƒ ์ฒ™๋„์— ๋Œ€ํ•ด ํ›จ์”ฌ ๋” ๊ฒฌ๊ณ ํ•˜๋‹ค๋Š” ๊ฒƒ์„ ์‹ค์ฆ์ ์œผ๋กœ ์ž…์ฆํ•œ๋‹ค.

Motivation

Achievement

Figure 3

Fig. 3: Values for the total baseline rewards during training

How

Figure 1

Fig. 1: The potential based (left), direct (middle), and base-

Originality

Limitation & Further Study

Evaluation

Novelty: 3/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ๊ณ ์ฐจ์› ๋กœ๋ณดํ‹ฑ ์‹œ์Šคํ…œ์—์„œ PBRS์˜ ์‹ค์ œ ํšจ๊ณผ๋ฅผ ์‹ค์ฆ์ ์œผ๋กœ ๊ฒ€์ฆํ•œ ์ค‘์š”ํ•œ ์ผ€์ด์Šค ์Šคํ„ฐ๋””๋กœ, ๋ณด์ƒ ํ•จ์ˆ˜ ์„ค๊ณ„์˜ ์‹ค๋ฌด์  ์ง€์นจ(ํŠนํžˆ ๊ฒฌ๊ณ ์„ฑ ์ธก๋ฉด)์„ ์ œ๊ณตํ•œ๋‹ค. ๋‹ค๋งŒ ๋‹จ์ผ ํƒœ์Šคํฌ ๋ฒค์น˜๋งˆํฌ์™€ ์ด๋ก -์‹ค์ „ ๊ฐ„ ๊ฒฉ์ฐจ์˜ ์›์ธ ๋ถ„์„์ด ๋ณด๊ฐ•๋œ๋‹ค๋ฉด ๋”์šฑ ๊ฐ•๋ ฅํ•œ ๊ธฐ์—ฌ๊ฐ€ ๋  ๊ฒƒ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •