Coordinated Humanoid Robot Locomotion with Symmetry Equivariant Reinforcement Learning Policy

์ €์ž: Buqing Nie, Yang Zhang, Rongjun Jin, Zhanxiang Cao, Huangxuan Lin, Xiaokang Yang, Yue Gao | ๋‚ ์งœ: 2025-08-02 | URL: https://arxiv.org/abs/2508.01247 📄 PDF


Essence

Figure 1

Figure 1: The overall architecture of SE-Policy. (a) Left: the architecture of the actor and critic model. (b) upper rig

์ธ๊ฐ„์˜ ์‹ ๊ฒฝ๊ณ„์—์„œ ์˜๊ฐ์„ ๋ฐ›์€ Symmetry Equivariant Policy (SE-Policy)๋ฅผ ์ œ์•ˆํ•˜์—ฌ, ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ํ˜•ํƒœ์  ๋Œ€์นญ์„ฑ์„ DRL ํ”„๋ ˆ์ž„์›Œํฌ์— ์—„๊ฒฉํ•˜๊ฒŒ ์ž„๋ฒ ๋”ฉํ•จ์œผ๋กœ์จ ์กฐ์ •๋˜๊ณ  ๊ท ํ˜•์žกํžŒ ๋ณดํ–‰์„ ์‹คํ˜„ํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: The tracking errors in terms of position (TE-P) and

How

Figure 1

Figure 1: The overall architecture of SE-Policy. (a) Left: the architecture of the actor and critic model. (b) upper rig

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: SE-Policy๋Š” ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ํ˜•ํƒœ์  ๋Œ€์นญ์„ฑ์„ ์—„๊ฒฉํ•œ ๋„คํŠธ์›Œํฌ ์ œ์•ฝ์œผ๋กœ ๊ตฌํ˜„ํ•˜์—ฌ ์ถ”๊ฐ€ ํ•˜์ดํผํŒŒ๋ผ๋ฏธํ„ฐ ์—†์ด 40% ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•œ ํ˜์‹ ์ ์ธ ๋ฐฉ๋ฒ•์ด๋ฉฐ, ์‹ค์ œ ๋กœ๋ด‡ ๋ฐฐํฌ๋ฅผ ํ†ตํ•ด ์‹ค์šฉ์„ฑ์„ ์ž…์ฆํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋†’์€ ๊ธฐ์—ฌ๋„๋ฅผ ๊ฐ€์ง„๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๋‹ค๋ฅธ ์ ‘๊ทผ
Geometry-Aware Predictive Safety Filters on Humanoids ๋…ผ๋ฌธ์€ ์ ‘์ด‰ ๋ฐ ํž˜ ์ถ”์ •์˜ ์•ˆ์ „์„ฑ ์ธก๋ฉด์—์„œ ๋‹ค๋ฅธ ์ตœ์ ํ™”์  ์ ‘๊ทผ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •