Efficient Diffusion Transformer Policies with Mixture of Expert Denoisers for Multitask Learning

์ €์ž: Moritz Reuss, Jyothish Pari, Pulkit Agrawal, Rudolf Lioutikov | ๋‚ ์งœ: 2024-12-17 | URL: https://arxiv.org/abs/2412.12953 📄 PDF


Essence

Figure 1

Figure 1: The proposed MoDE architecture (left) uses a transformer with causal masking, where each

MoDE๋Š” Mixture-of-Experts ์•„ํ‚คํ…์ฒ˜๋ฅผ Diffusion Policy์— ์ ์šฉํ•˜์—ฌ noise-conditioned routing๊ณผ noise-conditioned self-attention์„ ํ†ตํ•ด ๋งค๊ฐœ๋ณ€์ˆ˜๋Š” 40% ๊ฐ์†Œ์‹œํ‚ค๋ฉด์„œ 90% ์ ์€ FLOPs๋กœ ๋” ๋†’์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•˜๋Š” ํšจ์œจ์ ์ธ Imitation Learning ์ •์ฑ…์ด๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: After training MoDE, the router is noise-conditioned, allowing pre-computation of the

How

Figure 1

Figure 1: The proposed MoDE architecture (left) uses a transformer with causal masking, where each

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: MoDE๋Š” noise-conditioned routing์ด๋ผ๋Š” ์ฐฝ์˜์ ์ธ ์•„์ด๋””์–ด๋กœ Diffusion Policy์˜ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์„ ํš๊ธฐ์ ์œผ๋กœ ๊ฐœ์„ ํ•˜๋ฉด์„œ๋„ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ ๊ฐ•๋ ฅํ•œ ๊ธฐ์—ฌ์ด๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜๊ณผ ablation study๋ฅผ ํ†ตํ•ด ๊ฒ€์ฆ๋˜์—ˆ์œผ๋‚˜, ์ด๋ก ์  ๊ธฐ์ดˆ ๊ฐ•ํ™”์™€ ๋” ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ์—์„œ์˜ ํ‰๊ฐ€๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •