Instruct2Act: Mapping Multi-modality Instructions to Robotic Actions with Large Language Model

์ €์ž: Siyuan Huang, Zhengkai Jiang, Hao Dong, Yu Qiao, Peng Gao, Hongsheng Li | ๋‚ ์งœ: 2023-05-18 | URL: https://arxiv.org/abs/2305.11176 📄 PDF


Essence

Figure 1

Figure 1:

๋ณธ ๋…ผ๋ฌธ์€ Large Language Model(LLM)์„ ํ™œ์šฉํ•˜์—ฌ ์ž์—ฐ์–ธ์–ด ๋ฐ ์‹œ๊ฐ์  ์ง€์‹œ์‚ฌํ•ญ์„ ๋กœ๋ด‡ ์กฐ์ž‘ ์ž‘์—…์˜ ์ˆœ์ฐจ์  ํ–‰๋™์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” Instruct2Act ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. SAM๊ณผ CLIP ๊ฐ™์€ ๊ธฐ์ดˆ ๋ชจ๋ธ๋“ค์„ API๋กœ ํ™œ์šฉํ•˜์—ฌ ์ธ์‹, ๊ณ„ํš, ํ–‰๋™ ๋ฃจํ”„๋ฅผ ๊ตฌํ˜„ํ•˜๋Š” Python ํ”„๋กœ๊ทธ๋žจ์„ ์ƒ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4: Evaluation task suite. We select six tabletop manipulation meta tasks to evaluate the pro-

How

Figure 2

Figure 2: The paradigm of our proposed Instruct2Act framework. Starting with the task instruc-

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ LLM๊ณผ ์‹œ๊ฐ ๊ธฐ์ดˆ ๋ชจ๋ธ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ง€์‹œ์‚ฌํ•ญ์„ ๋กœ๋ด‡ ํ–‰๋™์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ์‹ค์šฉ์ ์ธ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ํ•™์Šต ์—†๋Š” ์ œ๋กœ์ƒท ๋ฐฉ์‹์œผ๋กœ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค๋Š” ์ ์—์„œ ์˜์˜๊ฐ€ ์žˆ๋‹ค. ๋‹ค๋งŒ ํ‰๊ฐ€ ๋ฒ”์œ„๊ฐ€ ์ œํ•œ์ ์ด๊ณ  ์˜ค๋ฅ˜ ์ „ํŒŒ ๋ฉ”์ปค๋‹ˆ์ฆ˜์— ๋Œ€ํ•œ ๋ถ„์„์ด ๋ณด์™„๋˜์–ด์•ผ ํ•  ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •