RVT: Robotic View Transformer for 3D Object Manipulation

์ €์ž: Ankit Goyal, Jie Xu, Yijie Guo, Valts Blukis, Yu-Wei Chao, Dieter Fox | ๋‚ ์งœ: 2023-06-26 | URL: https://arxiv.org/abs/2306.14896 📄 PDF


Essence

Figure 2

Figure 2: Overview of RVT. Given RGB-D from sensor(s), we first construct a point cloud of the

RVT๋Š” 3D ๋ฌผ์ฒด ์กฐ์ž‘์„ ์œ„ํ•ด multi-view transformer๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ๋ช…์‹œ์  3D ํ‘œํ˜„์˜ ๊ณ„์‚ฐ ๋น„์šฉ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๋ฉด์„œ ๋†’์€ ์ •ํ™•๋„์™€ ํ™•์žฅ์„ฑ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: RVT scales and performs better

How

Figure 2

Figure 2: Overview of RVT. Given RGB-D from sensor(s), we first construct a point cloud of the

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: RVT๋Š” voxel ๊ธฐ๋ฐ˜์˜ ๋†’์€ ์„ฑ๋Šฅ๊ณผ view ๊ธฐ๋ฐ˜์˜ ํ™•์žฅ์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•œ ํ˜์‹ ์  ๋ฐฉ๋ฒ•์œผ๋กœ, ์‹ค์งˆ์ ์ธ ํ›ˆ๋ จ ์‹œ๊ฐ„ ๋‹จ์ถ•๊ณผ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋™์‹œ์— ๋‹ฌ์„ฑํ•˜์—ฌ ๋กœ๋ด‡ ์กฐ์ž‘ ์—ฐ๊ตฌ์˜ ๋ฐœ์ „์— ์ƒ๋‹นํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •