SteadyTray: Learning Object Balancing Tasks in Humanoid Tray Transport via Residual Reinforcement Learning

์ €์ž: Anlun Huang, Zhenyu Wu, Soofiyan Atar, Yuheng Zhi, Michael Yip | ๋‚ ์งœ: 2026-03-11 | DOI: 10.48550/arXiv.2603.10306 📄 PDF


Essence

Figure 2

Fig. 2: Overview of the ReST-RL framework. Base Policy Training: A locomotion policy is first trained to carry a tray wh

ReST-RL์€ ์‚ฌ์ „ํ•™์Šต๋œ ์ด์กฑ ๋ณดํ–‰ ์ •์ฑ…์— ์ž”์ฐจ ๋ชจ๋“ˆ์„ ์ถ”๊ฐ€ํ•˜์—ฌ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์ด ๋™์  ๋ณดํ–‰ ์ค‘ ํŠธ๋ ˆ์ด ์œ„์˜ ๋ถˆ์•ˆ์ •ํ•œ ๋ฌผ์ฒด๋ฅผ ์•ˆ์ •์ ์œผ๋กœ ์šด๋ฐ˜ํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•˜๋Š” ๊ณ„์ธต์  ๊ฐ•ํ™”ํ•™์Šต ์•„ํ‚คํ…์ฒ˜์ด๋‹ค.

Motivation

Achievement

Figure 4

Fig. 4: Training reward comparison between End2End and

How

Figure 2

Fig. 2: Overview of the ReST-RL framework. Base Policy Training: A locomotion policy is first trained to carry a tray wh

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ReST-RL์€ ๋ณดํ–‰ ์•ˆ์ •์„ฑ์„ ๋ณด์กดํ•˜๋ฉด์„œ payload ์•ˆ์ •ํ™”๋ฅผ ๋ถ„๋ฆฌ ํ•™์Šตํ•˜๋Š” ์šฐ์•„ํ•œ ์„ค๊ณ„๋กœ, ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์‹ค์ œ ์„œ๋น„์Šค ์‘์šฉ(์‹์Œ๋ฃŒ ๋ฐฐ์†ก, ์˜๋ฃŒ ๊ธฐ๊ตฌ ์šด๋ฐ˜)์— ํ•„์ˆ˜์ ์ธ ์‹ ๋ขฐ์„ฑ ๋†’์€ ๋ฌผ์ฒด ์šด๋ฐ˜์„ ์ฒ˜์Œ ์„ฑ๊ณต์ ์œผ๋กœ ์‹œ์—ฐํ–ˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •