Reduced-Order Model-Guided Reinforcement Learning for Demonstration-Free Humanoid Locomotion

์ €์ž: Shuai Liu, Meng Cheng Lau | ๋‚ ์งœ: 2025-09-23 | URL: https://arxiv.org/abs/2509.19023 📄 PDF


Essence

Figure 1

Figure 1: Overview of the ROM-GRL framework. In Stage 1, a 4-DOF ROM policy is trained in Box2D: the policy

ROM-GRL์€ ๋ชจ์…˜์บก์ฒ˜ ๋ฐ์ดํ„ฐ ์—†์ด 4-DOF reduced-order model๋กœ ์ƒ์„ฑํ•œ gait template์„ ์ด์šฉํ•ด full-body humanoid ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” 2๋‹จ๊ณ„ ๊ฐ•ํ™”ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. Adversarial discriminator๋ฅผ ํ†ตํ•ด ROM์˜ 5-dimensional gait feature ๋ถ„ํฌ๋ฅผ ๋”ฐ๋ฅด๋„๋ก ์œ ๋„ํ•˜์—ฌ ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ณดํ–‰์„ ์‹คํ˜„ํ•œ๋‹ค.

Motivation

Achievement

Figure 3

Figure 3 visualizes pelvis and foot trajectories for the ROM-GRL policy (blue) and the pure-reward baseline (orange),

How

Figure 2

Figure 2: Schematic of the planar ROM used to generate reference walking trajectories. The ROM consists of a central

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ROM-GRL์€ reduced-order model์„ creativeํ•˜๊ฒŒ ํ™œ์šฉํ•ด motion capture ์˜์กด์„ฑ์„ ์ œ๊ฑฐํ•˜๋ฉด์„œ ์ž์—ฐ์Šค๋Ÿฝ๊ณ  ์•ˆ์ •์ ์ธ humanoid ๋ณดํ–‰์„ ๋‹ฌ์„ฑํ•˜๋Š” novel ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ๋ณด์ƒ ์„ค๊ณ„์™€ ๋ชจ๋ฐฉ ํ•™์Šต ๊ฐ„ ๊ฐ„๊ฒฉ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ค„์˜€์œผ๋‚˜, ์ œํ•œ๋œ ์†๋„ ๋ฒ”์œ„์™€ ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ ๋ถ€์žฌ๊ฐ€ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์˜ ์˜๋ฌธ์„ ๋‚จ๊ธด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •