HumanPlus: Humanoid Shadowing and Imitation from Humans

์ €์ž: Zipeng Fu, Qingqing Zhao, Qi Wu, Gordon Wetzstein, Chelsea Finn | ๋‚ ์งœ: 2024-06-15 | URL: https://arxiv.org/abs/2406.10454 📄 PDF


Essence

Figure 3

Figure 3: Shadowing and Retargeting. Our system uses one RGB camera for body and hand pose estimation.

ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์ด ๋‹จ์ผ RGB ์นด๋ฉ”๋ผ๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ธ๊ฐ„์˜ ๋™์ž‘์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ๋”ฐ๋ผํ•  ์ˆ˜ ์žˆ๋Š” shadowing ์‹œ์Šคํ…œ๊ณผ, ์ˆ˜์ง‘๋œ ๋ฐ์ดํ„ฐ๋กœ๋ถ€ํ„ฐ ์ž์œจ์ ์ธ ์ž‘์—… ๊ธฐ์ˆ ์„ ํ•™์Šตํ•˜๋Š” imitation learning ํŒŒ์ดํ”„๋ผ์ธ์„ ์ œ์‹œํ•˜๋Š” ์ „์ฒด ์Šคํƒ ์‹œ์Šคํ…œ์ด๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: Stanford HumanPlus Robot. We present a full-stack system for humanoid robots to learn motion and

How

Figure 3

Figure 3: Shadowing and Retargeting. Our system uses one RGB camera for body and hand pose estimation.

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ธ๊ฐ„ ๋ฐ์ดํ„ฐ ํ™œ์šฉ์ด๋ผ๋Š” ์˜ค๋žซ๋™์•ˆ์˜ ๊ณผ์ œ์— ๋Œ€ํ•ด ์‹ค์šฉ์ ์ด๊ณ  ์™„์„ฑ๋„ ๋†’์€ end-to-end ์‹œ์Šคํ…œ์„ ์ œ์‹œํ–ˆ์œผ๋ฉฐ, RGB ์นด๋ฉ”๋ผ ๊ธฐ๋ฐ˜ shadowing์˜ ๋‹จ์ˆœ์„ฑ๊ณผ ํšจ์œจ์„ฑ, ๊ทธ๋ฆฌ๊ณ  ๋‹ค์–‘ํ•œ ์ž์œจ ์ž‘์—…์˜ ์„ฑ๊ณต์  ๊ตฌํ˜„์€ ๋กœ๋ด‡ ๊ณตํ•™ ๋ถ„์•ผ์— ์‹ค์งˆ์ ์ธ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •