EgoMimic: Scaling Imitation Learning via Egocentric Video

์ €์ž: Simar Kareer, Dhruv Patel, Ryan Punamiya, Pranay Mathur, Shuo Cheng, Chen Wang, Judy Hoffman, Danfei Xu | ๋‚ ์งœ: 2024-10-31 | URL: https://arxiv.org/abs/2410.24221 📄 PDF


Essence

Figure 1

Fig. 1: EgoMimic unlocks human embodiment dataโ€”egocentric videos paired with 3D hand tracksโ€”as a new scalable data sourc

EgoMimic์€ Project Aria ์•ˆ๊ฒฝ์„ ํ†ตํ•ด ์ˆ˜์ง‘ํ•œ ์ธ๊ฐ„์˜ ์ผ์ธ์นญ ์‹œ์  ๋น„๋””์˜ค์™€ 3D ์† ์ถ”์  ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋ด‡ ์กฐ์ž‘ ํ•™์Šต์— ํ™œ์šฉํ•˜๋Š” ์ „์ฒด ์Šคํƒ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ์ธ๊ฐ„๊ณผ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ๋ฅผ ๋™๋“ฑํ•œ embodied demonstration์œผ๋กœ ์ทจ๊ธ‰ํ•˜์—ฌ ํ†ตํ•ฉ ์ •์ฑ…์„ ํ•™์Šตํ•œ๋‹ค.

Motivation

Achievement

Figure 5

Fig. 5: We evaluate EgoMimic across three real world, long-horizon manipulation tasks. See Sec. IV-A for description.

How

Figure 2

Fig. 2: Our human data system uses Aria glasses to capture Egocentric RGB and uses its side SLAM cameras to localize the

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: EgoMimic์€ ์ธ๊ฐ„์˜ ์ผ์ธ์นญ ์‹œ์  ๋ฐ์ดํ„ฐ๋ฅผ ๋กœ๋ด‡ ํ•™์Šต์— ๋™๋“ฑํ•˜๊ฒŒ ํ™œ์šฉํ•˜๋Š” ํ˜์‹ ์  ์ ‘๊ทผ์œผ๋กœ, ์‹ค์ œ ์กฐ์ž‘ ์ž‘์—…์—์„œ ๋›ฐ์–ด๋‚œ ์„ฑ๋Šฅ ๊ฐœ์„ ๊ณผ ์ผ๋ฐ˜ํ™”๋ฅผ ์ž…์ฆํ–ˆ์œผ๋ฉฐ, ์ˆ˜๋™์  ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘์˜ ๊ฐ€๋Šฅ์„ฑ์„ ์—ด์–ด ๋กœ๋ด‡ ํ•™์Šต์˜ ํ™•์žฅ์„ฑ ๋ฌธ์ œ ํ•ด๊ฒฐ์— ํฌ๊ฒŒ ๊ธฐ์—ฌํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •