DreamGen: Unlocking Generalization in Robot Learning through Video World Models

์ €์ž: Joel Jang, Seonghyeon Ye, Zongyu Lin, Jiannan Xiang, Johan Bjorck, Yu Fang, Fengyuan Hu, Spencer Huang, Kaushil Kundalia, Yen-Chen Lin, Loic Magne, Ajay Mandlekar, Avnish Narayan, You Liang Tan, Guanzhi Wang, Jing Wang, Qi Wang, Yinzhen Xu, Xiaohui Zeng, Kaiyuan Zheng, Ruijie Zheng, Ming-Yu Liu, Luke Zettlemoyer, Dieter Fox, Jan Kautz, Scott Reed, Yuke Zhu, Linxi Fan | ๋‚ ์งœ: 2025-05-19 | URL: https://arxiv.org/abs/2505.12705 📄 PDF


Essence

Figure 2

Figure 2: DREAMGEN Overview. We begin by fine-tuning a video world model on teleoperated robot trajectories.

DreamGen์€ ๋น„๋””์˜ค ์›”๋“œ ๋ชจ๋ธ(video world model)์„ ํ™œ์šฉํ•˜์—ฌ ์ตœ์†Œํ•œ์˜ ์›๊ฒฉ์กฐ์ข… ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ๋กœ๋ด‡ ์ •์ฑ…์„ ํ•™์Šตํ•˜๋Š” 4๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ, ์‹ ๊ทœ ํ–‰๋™๊ณผ ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™”๋ฅผ ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4: Scaling # of Neural Trajectories in RoboCasa. We vary the sizes of neural trajectories (x-axis) and

How

Figure 3

Figure 3 shows the (a) architecture we use to train the IDM model and the (b) architecture that we use to train the

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: DreamGen์€ ๋น„๋””์˜ค ์›”๋“œ ๋ชจ๋ธ์„ ๋กœ๋ด‡ ํ•™์Šต์˜ ํšจ์œจ์ ์ธ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋„๊ตฌ๋กœ ์žฌ์ •์˜ํ•˜์—ฌ, ์ตœ์†Œํ•œ์˜ ์›๊ฒฉ์กฐ์ข… ๋ฐ์ดํ„ฐ๋กœ ๋‹ค์–‘ํ•œ ํ–‰๋™๊ณผ ํ™˜๊ฒฝ ์ผ๋ฐ˜ํ™”๋ฅผ ๋‹ฌ์„ฑํ•˜๋Š” ํ˜์‹ ์ ์ด๊ณ  ์‹ค์šฉ์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๋‹ค์ค‘ embodiment ์‹ค์„ธ๊ณ„ ๊ฒ€์ฆ๊ณผ DreamGen Bench๋ผ๋Š” ์ฒด๊ณ„์  ํ‰๊ฐ€ ๋„๊ตฌ๊นŒ์ง€ ์ œ๊ณตํ•˜์—ฌ ๋กœ๋ด‡ ํ•™์Šต ํ™•์žฅ์˜ ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •