EgoMI: Learning Active Vision and Whole-Body Manipulation from Egocentric Human Demonstrations

์ €์ž: Justin Yu, Yide Shentu, Di Wu, Pieter Abbeel, Ken Goldberg, Philipp Wu | ๋‚ ์งœ: 2025-10-31 | URL: https://arxiv.org/abs/2511.00153 📄 PDF


Essence

Figure 1

Fig. 1: Overview of the EgoMI framework. EgoMI captures egocentric human demonstrations with synchronized head and hand

EgoMI๋Š” ์ธ๊ฐ„์˜ ๋™์‹œํ™”๋œ ๋จธ๋ฆฌ ๋ฐ ์† ์›€์ง์ž„์„ ํฌ์ฐฉํ•˜๋Š” egocentric ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, SPARKS ๋ฉ”๋ชจ๋ฆฌ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๊ธ‰์†ํ•œ ์‹œ์  ๋ณ€ํ™”๋ฅผ ์ฒ˜๋ฆฌํ•˜์—ฌ ๋ฐ˜์ธ๊ฐ„ํ˜• ๋กœ๋ด‡์œผ๋กœ zero-shot ์ „์ด๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Fig. 4: Tabletop Task Rollout Sequence: (Left). The images show a real 29D policy evaluation rollout where the robot (1)

How

Figure 1

Fig. 1: Overview of the EgoMI framework. EgoMI captures egocentric human demonstrations with synchronized head and hand

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: EgoMI๋Š” ์ธ๊ฐ„์˜ active vision๊ณผ manipulation์„ ๋™์‹œ์— ํฌ์ฐฉํ•˜๋Š” ์ฐฝ์˜์  ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, SPARKS ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ํ†ตํ•ด ๊ธ‰์†ํ•œ ์‹œ์  ๋ณ€ํ™”๋ฅผ ์šฐ์•„ํ•˜๊ฒŒ ์ฒ˜๋ฆฌํ•˜๋ฉฐ zero-shot transfer๋ฅผ ๋‹ฌ์„ฑํ•ด imitation learning์˜ embodiment gap ๋ฌธ์ œ์— ์‹ค์งˆ์  ์†”๋ฃจ์…˜์„ ์ œ์‹œํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •