H-RDT: Human Manipulation Enhanced Bimanual Robotic Manipulation

์ €์ž: Hongzhe Bi, Lingxuan Wu, Tianwei Lin, Hengkai Tan, Zhizhong Su, Hang Su, Jun Zhu | ๋‚ ์งœ: 2025-07-31 | URL: https://arxiv.org/abs/2507.23523 📄 PDF


Essence

Figure 1

Figure 1: Overview of H-RDT. A human-to-robotics diffusion transformer with two-stage training.

H-RDT๋Š” ๋Œ€๊ทœ๋ชจ egocentric ์ธ๊ฐ„ ์กฐ์ž‘ ๋ฐ์ดํ„ฐ๋กœ ์‚ฌ์ „ํ•™์Šตํ•˜๊ณ  ๋ชจ๋“ˆ์‹ action encoder/decoder๋ฅผ ํ†ตํ•ด ๋‹ค์–‘ํ•œ ๋กœ๋ด‡์— fine-tuningํ•˜๋Š” ๋‘ ๋‹จ๊ณ„ diffusion transformer ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์œผ๋กœ, ๋กœ๋ด‡ ์กฐ์ž‘ ํ•™์Šต์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

Motivation

Achievement

Figure 3

Figure 3: Cross-embodiment multi-task performance on

How

Figure 2

Figure 2: H-RDT framework. Our approach consists of two main stages: (1) pre-training on large-scale human manipulation

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: H-RDT๋Š” ๋Œ€๊ทœ๋ชจ egocentric human manipulation ๋ฐ์ดํ„ฐ์˜ ๊ฐ€์น˜๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ž…์ฆํ•˜๋ฉด์„œ, ๋ชจ๋“ˆ์‹ ์ „์ด ๊ตฌ์กฐ๋ฅผ ํ†ตํ•ด diverse robot platform์œผ๋กœ์˜ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€ ํ˜์‹ ์  ์—ฐ๊ตฌ์ด๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜๊ณผ ๊ฐ•๋ ฅํ•œ empirical ๊ฒฐ๊ณผ๊ฐ€ robotic manipulation ํ•™์Šต์˜ data scarcity ๋ฌธ์ œ ํ•ด๊ฒฐ์— ์‹ค์งˆ์ ์ธ ๊ธฐ์—ฌ๋ฅผ ํ•˜๊ณ  ์žˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •