Latent Action Pretraining from Videos

์ €์ž: Seonghyeon Ye, Joel Jang, Byeongguk Jeon, Sejune Joo, Jianwei Yang, Baolin Peng, Ajay Mandlekar, Reuben Tan, Yu-Wei Chao, Bill Yuchen Lin, Lars Liden, Kimin Lee, Jianfeng Gao, Luke Zettlemoyer, Dieter Fox, Minjoon Seo | ๋‚ ์งœ: 2024-10-15 | URL: https://arxiv.org/abs/2410.11758 📄 PDF


Essence

Figure 2

Figure 2: Overview of Latent Action Pretraining. (1) Latent Action Quantization: We first learn discrete

์ธํ„ฐ๋„ท ๊ทœ๋ชจ์˜ ๋ผ๋ฒจ ์—†๋Š” ๋น„๋””์˜ค์—์„œ ๋กœ๋ด‡ ํ–‰๋™์„ ํ•™์Šตํ•˜๊ธฐ ์œ„ํ•ด VQ-VAE ๊ธฐ๋ฐ˜ ์ž ์žฌ ํ–‰๋™ ์–‘์žํ™”์™€ Vision-Language-Action ๋ชจ๋ธ ์‚ฌ์ „ํ•™์Šต์„ ๊ฒฐํ•ฉํ•œ ๋น„์ง€๋„ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค.

Motivation

Achievement

Figure 3

Figure 3: Real-world Tabletop Manipulation Results. We evaluate on a total of 54 rollouts for each model

How

Figure 2

Figure 2: Overview of Latent Action Pretraining. (1) Latent Action Quantization: We first learn discrete

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋กœ๋ด‡ ํ•™์Šต์˜ ์ฃผ์š” ์ œ์•ฝ์ธ ํ–‰๋™ ๋ ˆ์ด๋ธ” ์˜์กด์„ฑ์„ ์ œ๊ฑฐํ•˜๋Š” ํ˜์‹ ์  ์ ‘๊ทผ์œผ๋กœ, ๋น„์ง€๋„ ํ•™์Šต์„ ํ†ตํ•ด ์ธํ„ฐ๋„ท ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ํ™œ์šฉ์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•˜๋ฉฐ, ์ƒํƒœ ๊ธฐ์ˆ  ๊ธฐ์ˆ ์„ ๋Šฅ๊ฐ€ํ•˜๋Š” ์‹ค์ œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ์ž…์ฆํ•œ ๋งค์šฐ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •