DoublyAware: Dual Planning and Policy Awareness for Temporal Difference Learning in Humanoid Locomotion

์ €์ž: Khang Nguyen, An T. Le, Jan Peters, Minh Nhat Vu | ๋‚ ์งœ: 2025-06-12 | URL: https://arxiv.org/abs/2506.12095 📄 PDF


Essence

Figure 1

Fig. 1: Overview of DoublyAware: Disjoint uncertainty decomposi-

DoublyAware๋Š” TD-MPC ํ”„๋ ˆ์ž„์›Œํฌ์—์„œ ๋ถˆํ™•์‹ค์„ฑ์„ planning uncertainty์™€ policy uncertainty๋กœ ๋ช…์‹œ์ ์œผ๋กœ ๋ถ„ํ•ดํ•˜์—ฌ, conformal prediction๊ณผ Group-Relative Policy Constraint๋ฅผ ํ†ตํ•ด ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ƒ˜ํ”Œ ํšจ์œจ์ ์ด๊ณ  ์•ˆ์ •์ ์ธ ํ•™์Šต์„ ์‹คํ˜„ํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Fig. 4: Episode Returns of DoublyAware and Baselines on H1โ€“2 in Locomotion Tasks: DoublyAware achieves rapid convergence

How

Figure 2

Fig. 2: Uncertainty-Aware Planning for Humanoid Locomotion: At each planning step, two sets of trajectories are sampled

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ MBRL์˜ ํ•ต์‹ฌ ๋ฌธ์ œ์ธ ๋ถˆํ™•์‹ค์„ฑ์„ planning๊ณผ policy๋กœ ๋ถ„ํ•ดํ•˜๊ณ  ๊ฐ๊ฐ์— ๋งž๋Š” ์—„๋ฐ€ํ•œ ํ•ด๋ฒ•(conformal prediction, GRPC)์„ ์ œ์‹œํ•จ์œผ๋กœ์จ ๊ฐœ๋…์  ๋ช…ํ™•์„ฑ๊ณผ ๊ธฐ์ˆ ์  ์šฐ์ˆ˜์„ฑ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ–ˆ๋‹ค. ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡ ์ œ์–ด๋ผ๋Š” ๋„์ „์  ๋ฌธ์ œ์—์„œ ์‹ค์ฆ์  ๊ฐœ์„ ์„ ๋ณด์—ฌ์ฃผ์—ˆ์œผ๋‚˜, ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ๊ณผ ๊ณ„์‚ฐ ๋น„์šฉ ๋ถ„์„์ด ๋ณด์™„๋˜๋ฉด ๋”์šฑ ๊ฐ•๋ ฅํ•œ ๊ธฐ์—ฌ๊ฐ€ ๋  ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •