NoMaD: Goal Masked Diffusion Policies for Navigation and Exploration

์ €์ž: Ajay Sridhar, Dhruv Shah, Catherine Glossop, Sergey Levine | ๋‚ ์งœ: 2023-10-11 | URL: https://arxiv.org/abs/2310.07896 📄 PDF


Essence

Figure 1

Fig. 1: NoMaD is the first flexibly conditioned diffusion model of robot actions that can perform both goal-conditioned

NoMaD๋Š” goal masking์„ ํ™œ์šฉํ•œ unified diffusion policy๋กœ ๋กœ๋ด‡์˜ ๋ชฉํ‘œ ์ง€ํ–ฅ ๋„ค๋น„๊ฒŒ์ด์…˜๊ณผ ๋ชฉํ‘œ ๋ฌด๊ด€ ํƒ์ƒ‰์„ ๋‹จ์ผ ๋ชจ๋ธ๋กœ ์ฒ˜๋ฆฌํ•˜๋ฉฐ, Transformer ๊ธฐ๋ฐ˜ ์ •์ฑ…๊ณผ diffusion model decoder๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฏธ์ง€์˜ ํ™˜๊ฒฝ์—์„œ ํšจ๊ณผ์ ์ธ ๋„ค๋น„๊ฒŒ์ด์…˜์„ ๊ตฌํ˜„ํ•œ๋‹ค.

Motivation

Achievement

How

Figure 2

Fig. 2: Model Architecture. NoMaD uses two EfficientNet encoders ฯˆ, ฯ• to generate input tokens to a Transformer decoder.

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: NoMaD๋Š” goal masking๊ณผ diffusion policy๋ฅผ ๊ฒฐํ•ฉํ•˜์—ฌ exploration๊ณผ goal-seeking์„ ํ†ตํ•ฉํ•œ ํ˜์‹ ์  ์•„ํ‚คํ…์ฒ˜๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ViNT ๋Œ€๋น„ 25% ์ด์ƒ์˜ ์„ฑ๋Šฅ ํ–ฅ์ƒ๊ณผ 15๋ฐฐ ํšจ์œจ์„ฑ ๊ฐœ์„ ์„ ์‹ค์ œ ๋กœ๋ด‡์—์„œ ๋‹ฌ์„ฑํ•˜์—ฌ ๋กœ๋ด‡ ๋„ค๋น„๊ฒŒ์ด์…˜ ๋ถ„์•ผ์— ์ƒ๋‹นํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •