MeshMimic: Geometry-Aware Humanoid Motion Learning through 3D Scene Reconstruction

์ €์ž: Qiang Zhang, Jiahao Ma, Peiran Liu, Shuai Shi, Zeran Su, Zifan Wang, Jingkai Sun, Wei Cui, Jialin Yu, Gang Han, Wen Zhao, Pihai Sun, Kangning Yin, Jiaxu Wang, Jiahang Cao, Lingfeng Zhang, Hao Cheng, Xiaoshuai Hao, Yiding Ji, Junwei Liang, Jian Tang, Renjing Xu, Yijie Guo | ๋‚ ์งœ: 2026-02-17 | DOI: 10.48550/arXiv.2602.15733 📄 PDF


Essence

Figure 1

Figure 1: MeshMimic: monocular video-to-humanoid robots. From ordinary consumer monocular videos (no

MeshMimic์€ ๋‹จ์ผ ๋ชจ๋…ธํ˜๋Ÿฌ ๋น„๋””์˜ค์—์„œ 3D ์žฅ๋ฉด ์žฌ๊ตฌ์„ฑ์„ ํ†ตํ•ด ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์ด ๋ณต์žกํ•œ ์ง€ํ˜•๊ณผ์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ํ•™์Šตํ•  ์ˆ˜ ์žˆ๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. Kinematic Consistency Optimization๊ณผ contact-aware retargeting์„ ํ†ตํ•ด ๋ชจ์…˜-์ง€ํ˜• ๊ฒฐํ•ฉ ์ƒํ˜ธ์ž‘์šฉ์„ ์ •ํ™•ํ•˜๊ฒŒ ์ „๋‹ฌํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: MeshMimic Real-to-Sim. In-the-wild monocular videos yield long-horizon motions over complex

How

Figure 3

Figure 3: MeshMimic Real-Sim-Real Pipeline. Starting from a monocular video, we reconstruct the scene

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: MeshMimic์€ 3D ๋น„์ „๊ณผ ๊ตฌ์ฒดํ™”๋œ ์ง€๋Šฅ์„ ์ฐฝ์˜์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๋น„์šฉ ํšจ์œจ์ ์ด๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡ ํ›ˆ๋ จ ๋ฐฉ์‹์„ ์ œ์‹œํ•œ๋‹ค. ๋ฌผ๋ฆฌ์  ์ผ๊ด€์„ฑ ์ตœ์ ํ™”์™€ ์ ‘์ด‰ ์ธ์‹ retargeting์„ ํ†ตํ•ด ๋ณต์žกํ•œ ์ง€ํ˜•์—์„œ์˜ ์•ˆ์ •์ ์ธ ์ƒํ˜ธ์ž‘์šฉ์„ ์‹คํ˜„ํ•จ์œผ๋กœ์จ ๋กœ๋ด‡ ์ œ์–ด ๋ถ„์•ผ์— ์ƒ๋‹นํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •