ZeroMimic: Distilling Robotic Manipulation Skills from Web Videos

์ €์ž: Junyao Shi, Zhuolun Zhao, Tianyou Wang, Ian Pedroza, Amy Luo, Jie Wang, Jason Ma, Dinesh Jayaraman | ๋‚ ์งœ: 2025-03-31 | URL: https://arxiv.org/abs/2503.23877 📄 PDF


Essence

Figure 1

Fig. 1: ZeroMimic distills robotic manipulation skills from egocentric web videos for zero-shot deployment across divers

ZeroMimic์€ EpicKitchens ๋ฐ์ดํ„ฐ์…‹์˜ ์ผ๋ฐ˜ ์ธ๊ฐ„ ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ๋กœ๋ด‡ ์กฐ์ž‘ ์Šคํ‚ฌ์„ ์ง์ ‘ ์ถ”์ถœํ•˜์—ฌ, ๋กœ๋ด‡๋ณ„ ๋ฐ๋ชจ๋‚˜ ํƒ์ƒ‰ ์—†์ด ์ฆ‰์‹œ ๋ฐฐํฌ ๊ฐ€๋Šฅํ•œ ์ด๋ฏธ์ง€ ๋ชฉํ‘œ ์กฐ๊ฑด๋ถ€ ์Šคํ‚ฌ ์ •์ฑ…์„ ์ƒ์„ฑํ•˜๋Š” ์ฒซ ๋ฒˆ์งธ ์‹œ์Šคํ…œ์ด๋‹ค.

Motivation

Achievement

Figure 5

Fig. 5: ZeroMimic Zero-Shot Performance Overview. ZeroMimic demonstrates strong generalization capabilities, achieving

How

Figure 3

Fig. 3: ZeroMimic is composed of the grasping phase and the post-grasp phase. The grasping phase (top) leverages

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ZeroMimic์€ in-the-wild ์ธ๊ฐ„ ๋น„๋””์˜ค๋กœ๋ถ€ํ„ฐ ๋กœ๋ด‡ ์กฐ์ž‘ ์Šคํ‚ฌ์„ ์ง์ ‘ ์ถ”์ถœํ•˜๋Š” ์‹ค์งˆ์ ์ด๊ณ  ํ™•์žฅ ๊ฐ€๋Šฅํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•˜๋ฉฐ, 71%๋Œ€์˜ ํ˜„์‹ค์  ์„ฑ๊ณต๋ฅ ๋กœ ์‹ค์šฉ์„ฑ์„ ์ž…์ฆํ•œ๋‹ค. ๋กœ๋ด‡ ํ•™์Šต์˜ ๋ฐ์ดํ„ฐ ๋ณ‘๋ชฉ์„ ํ•ด์†Œํ•˜๋Š” ์ค‘์š”ํ•œ ์ง„์ „์ด์ง€๋งŒ, ํ‰๊ฐ€ ๋ฒ”์œ„ ํ™•๋Œ€์™€ ์‹คํŒจ ๋ถ„์„ ๊ฐ•ํ™”๊ฐ€ ํ–ฅํ›„ ๊ณผ์ œ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •