RGMP: Recurrent Geometric-prior Multimodal Policy for Generalizable Humanoid Robot Manipulation

์ €์ž: Xuetao Li, Wenke Huang, Nengyuan Pan, Kaiyan Zhao, Songhua Yang, Yiming Wang, Mengde Li, Mang Ye, Jifeng Xuan, Miao Li | ๋‚ ์งœ: 2025-11-12 | URL: https://arxiv.org/abs/2511.09141 📄 PDF


Essence

Figure 2

Figure 2: Pipeline of RGMP. Upon receiving a speech command, the robot utilizes GSS to identify and localize the target

๊ธฐํ•˜ํ•™์  ์ถ”๋ก ๊ณผ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์„ ๊ฒฐํ•ฉํ•œ RGMP๋Š” humanoid robot ์กฐ์ž‘์„ ์œ„ํ•ด Geometric-prior Skill Selector์™€ Adaptive Recursive Gaussian Network๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ 87% ์„ฑ๊ณต๋ฅ ๊ณผ 5๋ฐฐ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: Overview of our framework. By applying seman-

How

Figure 2

Figure 2: Pipeline of RGMP. Upon receiving a speech command, the robot utilizes GSS to identify and localize the target

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: RGMP๋Š” ๊ธฐํ•˜ํ•™์  ์ถ”๋ก ๊ณผ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์˜ ๊ฒฐํ•ฉ์„ ํ†ตํ•ด humanoid robot ์กฐ์ž‘์˜ ์ค‘์š”ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉฐ, GSS์™€ ARGN์˜ ์„ค๊ณ„๊ฐ€ ์ •๊ตํ•˜๊ณ  ์‹ค์ œ ๋กœ๋ด‡์—์„œ strong empirical result๋ฅผ ๋‹ฌ์„ฑํ•œ ์šฐ์ˆ˜ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ๋‹ค๋งŒ ๊ธฐํ•˜ํ•™์  ์ œ์•ฝ์˜ ์ž๋™ํ™”์™€ ๋” ๊ด‘๋ฒ”์œ„ํ•œ ์‹ค์ฆ ํ‰๊ฐ€๊ฐ€ ์ด๋ฃจ์–ด์ง„๋‹ค๋ฉด ๋”์šฑ ๊ฐ•๋ ฅํ•  ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •