PILOT: A Perceptive Integrated Low-level Controller for Loco-manipulation over Unstructured Scenes

Essence

Fig. 1. Method overview of PILOT. We propose a unified single-stage reinforcement learning framework that seamlessly int

PILOT는 humanoid robot의 loco-manipulation을 위한 통합 단계 RL 프레임워크로, 지각 기반 locomotion과 전신 제어를 단일 policy로 통합하여 비정형 지형에서 안정적인 작업 실행을 가능하게 한다.

Known: Humanoid robot의 locomotion은 blind policy에서 perception 기반 elevation map 방식으로 발전했으며, 기존 whole-body controller들은 manipulation 능력이 제한적이었다.
Gap: 기존 연구들은 지형 인식 없이 loco-manipulation을 수행하거나, lower-body와 upper-body를 분리하여 제어함으로써 자연스러운 whole-body 협응을 놓치고 있다.
Why: Humanoid robot이 인간 중심 환경에서 계단이나 울퉁불퉁한 지형을 안전하게 이동하면서 조작 작업을 수행해야 하는 실제 응용이 필요하기 때문이다.
Approach: Cross-modal context encoder로 proprioceptive features와 LiDAR 기반 elevation map을 fusion하고, Mixture-of-Experts 구조로 다양한 motor skill을 조정하여 통합된 perceptive loco-manipulation controller를 제안한다.

Fig. 3. Real-world Experiments. PILOT successfully executes object transport tasks across challenging terrains. The robo

통합 프레임워크: 단일 policy로 perception-aware locomotion과 large-workspace whole-body control을 seamlessly 통합
다중 모달 인식: Attention 기반 multi-scale perception encoder로 정확한 foot placement와 terrain awareness 강화
운동 기술 조정: MoE 구조를 통해 locomotion과 manipulation 간의 자연스러운 전환 및 협응 실현
실제 성능: Unitree G1 humanoid robot에서 stairs와 high steps 같은 복잡 지형에서 superior stability와 command tracking precision 달성

Fig. 2. Visualization of expert activation across six motion modes. The

Robot-centric LiDAR 기반 elevation map으로 주변 지형 정보 캡처
Prediction 기반 proprioceptive feature와 attention 기반 perceptive representation을 fusion하는 cross-modal context encoder 설계
Random command sampling으로 feasible command space를 포괄적으로 커버하여 distribution bias 완화
Mixture-of-Experts policy architecture로 diverse motor skill 간 coordination
VR interface를 통한 teleoperation과 hierarchical RL 기반 autonomous task execution 지원

Perception-aware whole-body control의 단일 통합 policy 설계로 기존 decoupled approach의 한계 극복
Cross-modal context encoder를 통한 proprioceptive와 exteroceptive feature의 principled fusion
MoE 구조를 humanoid loco-manipulation 문제에 적용하여 motor skill 간 자동 specialization 실현
Motion capture 데이터 대신 progressive random command sampling으로 distribution bias 제거

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

총평: PILOT는 humanoid loco-manipulation 문제에 대한 통합적이고 실용적인 해결책을 제시하며, cross-modal perception과 MoE 구조를 통해 기술적 기여와 실제 로봇 구현의 성공적 사례를 보여준다.

다른 접근

저수준-고수준 통합 제어와 달리 Perceptive Integrated Low-level Controller를 통해 전신 보행 제어를 직접 학습하는 방식을 보여준다.