MimicPlay: Long-Horizon Imitation Learning by Watching Human Play

์ €์ž: Chen Wang, Linxi Fan, Jiankai Sun, Ruohan Zhang, Li Fei-Fei, Danfei Xu, Yuke Zhu, Anima Anandkumar | ๋‚ ์งœ: 2023-02-24 | URL: https://arxiv.org/abs/2302.12422 📄 PDF


Essence

Figure 1

Figure 1: Human is able to complete a long-horizon task much faster than a teleoperated robot. This

MimicPlay๋Š” ์ €๋น„์šฉ์˜ ์ธ๊ฐ„ ํ”Œ๋ ˆ์ด ๋ฐ์ดํ„ฐ์—์„œ ๊ณ ์ˆ˜์ค€ ๊ณ„ํš์„ ํ•™์Šตํ•˜๊ณ  ์†Œ๋Ÿ‰์˜ ์›๊ฒฉ์กฐ์ข… ๋ฐ์ดํ„ฐ์—์„œ ์ €์ˆ˜์ค€ ์ œ์–ด ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” ๊ณ„์ธต์  ๋ชจ๋ฐฉ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ์žฅ๊ธฐ ์กฐ์ž‘ ์ž‘์—…์˜ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ์„ ๋Œ€ํญ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4: Evaluation of multi-task policy

How

Figure 2

Figure 2: Overview of MIMICPLAY. (a) Training Stage 1: using cheap human play data to train a

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: MimicPlay๋Š” ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ๋น„์šฉ์ด๋ผ๋Š” ๋ชจ๋ฐฉ ํ•™์Šต์˜ ๊ทผ๋ณธ์  ๋ฌธ์ œ๋ฅผ ์ฐฝ์˜์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ฉด์„œ ์‹ค์ œ ๋กœ๋ด‡ ์ž‘์—…์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ์ž…์ฆํ•œ ์˜๋ฏธ์žˆ๋Š” ์—ฐ๊ตฌ์ด๋‹ค. ์ธ๊ฐ„๊ณผ ๋กœ๋ด‡ ๋ฐ์ดํ„ฐ์˜ ์ƒ๋ณด์  ํ™œ์šฉ์ด๋ผ๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์€ ๋กœ๋ด‡ ํ•™์Šต์˜ ํ™•์žฅ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ฌ ์ˆ˜ ์žˆ๋Š” ์ž ์žฌ๋ ฅ์„ ๋ณด์—ฌ์ค€๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •