RuN: Residual Policy for Natural Humanoid Locomotion

Essence

Fig. 2: Overview of the RuN framework. (a) Motion Retargeting: Raw human motions are converted into a kinematically feas

RuN은 Conditional Motion Generator를 통한 운동학적 모션 프라이어와 강화학습 기반 residual policy를 분리하여, 인형로봇의 자연스러운 보행-달리기 전환을 실현하는 decoupled residual learning 프레임워크이다.

Motivation

Known: Deep Reinforcement Learning은 인형로봇 제어에 강력하지만, 단일 정책이 운동 모방, 속도 추적, 안정성을 동시에 학습해야 하는 복잡성이 있다. Generative Motion Prior 방식이 자연스러운 동작을 제공하지만 직접 추적(direct tracking) 전략의 학습 복잡성이 여전히 높다.
Gap: 기존 DRL 기반 방법들은 motion imitation, velocity tracking, stability maintenance의 세 가지 목표가 충돌하여 성능과 학습 효율성에 트레이드오프가 발생한다. 이를 해결하기 위한 decoupled 구조의 체계적 접근이 부족하다.
Why: 인형로봇이 인간 중심 환경에서 다양한 속도에서 자연스럽고 동적인 보행을 수행할 수 있어야 하며, 특히 보행-달리기 간 매끄러운 전환은 실용적 배포에 필수적이다.
Approach: CMG를 통해 kinematically natural motion prior를 생성하고, 경량의 residual policy가 동역학적 상호작용을 보정하는 방식으로 제어 태스크를 분해한다.

Achievement

Fig. 5: Performance comparison of different algorithms. This figure shows

Decoupled Residual Learning Framework: 운동 제어를 motion prior와 residual correction으로 분리하여 학습 공간을 대폭 축소
Conditional Motion Generator: 인간 모션 데이터셋으로 학습된 autoregressive 생성 모델로 보행-달리기 범위의 자연스러운 운동 생성
광범위한 속도 범위 커버: 0-2.5 m/s 범위에서 안정적이고 자연스러운 보행과 매끄러운 전환 달성
실제 로봇 검증: Unitree G1 인형로봇에서 시뮬레이션과 현실 실험을 통해 state-of-the-art 대비 우수한 성능 입증
학습 효율성 개선: 기존 방법 대비 훨씬 빠른 training 수렴

How

Fig. 2: Overview of the RuN framework. (a) Motion Retargeting: Raw human motions are converted into a kinematically feas

대규모 인간 모션 데이터셋을 motion retargeting을 통해 kinematically feasible 참조 데이터로 변환
변환된 데이터로 autoregressive CMG를 offline으로 학습하여 frozen motion prior 생성
PPO 강화학습으로 경량 residual policy 훈련하며, imitation rewards, task rewards, regularization rewards의 조합 사용
최종 제어 명령 = CMG 출력 + residual policy 출력의 가산 구조
시뮬레이션 환경에서 학습 후 실제 로봇으로 sim-to-real 전이

Originality

기존 direct tracking 기반 GMP 방식과 달리 residual learning으로 구조적 분리를 달성한 novel한 접근
Autoregressive CMG를 humanoid locomotion 분야에 적용하여 조건부 운동 생성의 새로운 활용
Multi-objective 충돌을 해결하기 위한 principled decomposition으로 학습 복잡성 대폭 감소
보행-달리기의 부드러운 전환을 residual policy 프레임워크로 실현한 최초 사례

Limitation & Further Study

CMG가 offline 데이터셋에 의존하므로 데이터셋 품질과 다양성이 최종 성능의 상한을 결정
Residual policy의 보정 범위가 제한되어 있어 극단적인 외부 섭동이나 예상 밖의 동역학에 대한 적응성 미검증
실험이 Unitree G1 단일 플랫폼에서만 수행되어 다른 인형로봇 아키텍처로의 일반화 가능성 불명확
후속 연구: (1) 적응형 residual policy를 통한 실시간 CMG 재조정, (2) 시각 정보를 활용한 더 복잡한 환경 네비게이션, (3) 더 많은 로봇 플랫폼에서의 검증

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

총평: RuN은 humanoid locomotion 제어의 근본적인 복잡성을 elegant하게 해결한 well-motivated 프레임워크로, decoupled residual learning 접근이 학습 효율성과 최종 성능을 모두 개선하며 실제 로봇에서 검증된 강력한 방법론이다.