H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning

์ €์ž: Yiyang Lu, Yufeng Tian, Zhecheng Yuan, Xianbang Wang, Pu Hua, Zhengrong Xue, Huazhe Xu | ๋‚ ์งœ: 2025-05-12 | URL: https://arxiv.org/abs/2505.07819 📄 PDF


Essence

Figure 2

Figure 2: Overview of H3DP. H3DP integrates three hierarchical design principles across the

HยณDP๋Š” RGB-D ์ž…๋ ฅ์˜ depth-aware layering, ๋‹ค์ค‘ ์Šค์ผ€์ผ visual representation, ๊ทธ๋ฆฌ๊ณ  hierarchically conditioned diffusion process๋ฅผ ํ†ตํ•ฉํ•˜์—ฌ visuomotor policy learning์—์„œ ์‹œ๊ฐ ์ธ์ง€์™€ ํ–‰๋™ ์ƒ์„ฑ ๊ฐ„์˜ coupling์„ ๊ฐ•ํ™”ํ•˜๋Š” ๋ฐฉ๋ฒ•๋ก ์ด๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: H3DP can not only achieve superior performance across 44 tasks on 5 simulation bench-

How

Figure 2

Figure 2: Overview of H3DP. H3DP integrates three hierarchical design principles across the

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 3/5 Overall: 4/5

์ดํ‰: HยณDP๋Š” visuomotor policy learning์˜ critical coupling ๋ฌธ์ œ๋ฅผ ๋ช…ํ™•ํ•˜๊ฒŒ ์‹๋ณ„ํ•˜๊ณ  human visual cortex์˜ ๊ณ„์ธต์  ์ฒ˜๋ฆฌ์—์„œ ์˜๊ฐ์„ ๋ฐ›์•„ ์ž…๋ ฅ๋ถ€ํ„ฐ ํ–‰๋™ ์ƒ์„ฑ๊นŒ์ง€ ์ผ๊ด€๋œ ๊ณ„์ธต์  ๊ตฌ์กฐ๋ฅผ ๊ตฌ์ถ•ํ•œ ํ˜์‹ ์  ์ ‘๊ทผ๋ฒ•์ด๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ์ƒ๋‹นํ•œ ์„ฑ๋Šฅ ๊ฐœ์„ ์„ ์ž…์ฆํ–ˆ์œผ๋‚˜, ๋ณธ๋ฌธ์ด ๋ฐœ์ทŒ๋ณธ์œผ๋กœ ์ผ๋ถ€ ๊ธฐ์ˆ ์  ์„ธ๋ถ€์‚ฌํ•ญ์ด ๋ถˆ๋ช…ํ™•ํ•˜๊ณ  ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜์˜ ๊ทœ๋ชจ๊ฐ€ ๋‹ค์†Œ ์ œํ•œ์ ์ด๋ผ๋Š” ์ ์€ ๊ฐœ์„  ์—ฌ์ง€๊ฐ€ ์žˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •