Scaling Robot Learning with Semantically Imagined Experience

์ €์ž: Tianhe Yu, Ted Xiao, Austin Stone, Jonathan Tompson, Anthony Brohan, Su Wang, Jaspiar Singh, Clayton Tan, Dee M, Jodilyn Peralta, Brian Ichter, Karol Hausman, Fei Xia | ๋‚ ์งœ: 2023-02-22 | URL: https://arxiv.org/abs/2302.11550 📄 PDF


Essence

Figure 1

Figure 1: We propose using text-guided diffusion models for data augmentation within the sphere

ROSIE๋Š” text-to-image diffusion ๋ชจ๋ธ์„ ์ด์šฉํ•œ inpainting์„ ํ†ตํ•ด ๊ธฐ์กด ๋กœ๋ด‡ ์กฐ์ž‘ ๋ฐ์ดํ„ฐ๋ฅผ ์˜๋ฏธ๋ก ์ ์œผ๋กœ ์ฆ๊ฐ•ํ•˜์—ฌ, ์ƒˆ๋กœ์šด ๋ฌผ์ฒด์™€ ํ™˜๊ฒฝ์— ๋Œ€ํ•œ ๋กœ๋ด‡์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4: Augmentations of in-hand objects during manipulation. We show examples where ROSIE

How

Figure 2

Figure 2: The proposed architecture of ROSIE. First, we localize the augmentation region with open

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ROSIE๋Š” ์ตœ์‹  text-to-image diffusion ๋ชจ๋ธ์„ ๋กœ๋ด‡ ํ•™์Šต์— ์ฐฝ์˜์ ์œผ๋กœ ์ ์šฉํ•˜์—ฌ ๊ณ ๋น„์šฉ์˜ ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ˆ˜์ง‘ ์—†์ด ์˜๋ฏธ๋ก ์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ํ•™์Šต ๋ฐ์ดํ„ฐ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ์‹ค์šฉ์ ์ธ ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ–ˆ๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ƒˆ๋กœ์šด ๋ฌผ์ฒด ์ผ๋ฐ˜ํ™”, ๋ฐฐ๊ฒฝ/๋ฐฉํ•ด๋ฌผ ๊ฐ•๊ฑด์„ฑ, ๊ณ ์ˆ˜์ค€ ์ž‘์—… ํ–ฅ์ƒ์„ ์ž…์ฆํ–ˆ์œผ๋ฉฐ, ๋กœ๋ด‡ ํ•™์Šต ์ปค๋ฎค๋‹ˆํ‹ฐ์— ๋†’์€ ์˜ํ–ฅ์„ ๋ฏธ์น  ๊ฐ€๋Šฅ์„ฑ์ด ์žˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •