Language to Rewards for Robotic Skill Synthesis

์ €์ž: Wenhao Yu, Nimrod Gileadi, Chuyuan Fu, Sean Kirmani, Kuang-Huei Lee, Montse Gonzalez Arenas, Hao-Tien Lewis Chiang, Tom Erez, Leonard Hasenclever, Jan Humplik, Brian Ichter, Ted Xiao, Peng Xu, Andy Zeng, Tingnan Zhang, Nicolas Heess, Dorsa Sadigh, Jie Tan, Yuval Tassa, Fei Xia | ๋‚ ์งœ: 2023-06-14 | URL: https://arxiv.org/abs/2306.08647 📄 PDF


Essence

Figure 1

Figure 1: LLMs have some internal knowledge about robot motions, but cannot directly translate them into actions

LLM์„ ์ด์šฉํ•˜์—ฌ ์ž์—ฐ์–ด ๋ช…๋ น์„ ๋ณด์ƒ ํ•จ์ˆ˜๋กœ ๋ณ€ํ™˜ํ•˜๊ณ , ์‹ค์‹œ๊ฐ„ ์ตœ์ ํ™”๊ธฐ(MuJoCo MPC)๋กœ ๋กœ๋ด‡ ํ–‰๋™์„ ํ•ฉ์„ฑํ•˜๋Š” ์ƒˆ๋กœ์šด ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์‹œํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4: Comparison of our method and alternative methods in terms of pass rate: if we generate N pieces of code

How

Figure 2

Figure 2: Detailed dataflow of the Reward Translator. A Motion Descriptor LLM takes the user input and describe

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ด ๋…ผ๋ฌธ์€ LLM์„ ๋ณด์ƒ ํ•จ์ˆ˜ ์ƒ์„ฑ๊ธฐ๋กœ ํ™œ์šฉํ•˜์—ฌ ์ž์—ฐ์–ธ์–ด์™€ ์ €์ˆ˜์ค€ ๋กœ๋ด‡ ๋™์ž‘ ์‚ฌ์ด์˜ ๊ฐ„๊ทน์„ ํšจ๊ณผ์ ์œผ๋กœ ํ•ด์†Œํ•˜๋Š” ํ˜์‹ ์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ๊ฐ•๋ ฅํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ์™€ ์‹ค์ œ ๋กœ๋ด‡ ๊ฒ€์ฆ์„ ํ†ตํ•ด ๋ฐฉ๋ฒ•๋ก ์˜ ํƒ€๋‹น์„ฑ์„ ์ž…์ฆํ•˜๋ฉฐ, ๋กœ๋ด‡ ์ œ์–ด์—์„œ LLM ํ™œ์šฉ์˜ ์ƒˆ๋กœ์šด ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •