ARNOLD: A Benchmark for Language-Grounded Task Learning With Continuous States in Realistic 3D Scenes

์ €์ž: Ran Gong, Jiangyong Huang, Yizhou Zhao, Haoran Geng, Xiaofeng Gao, Qingyang Wu, Wensi Ai, Ziheng Zhou, Demetri Terzopoulos, Song-Chun Zhu, Baoxiong Jia, Siyuan Huang | ๋‚ ์งœ: 2023-04-09 | URL: https://arxiv.org/abs/2304.04321 📄 PDF


Essence

Figure 1

Figure 1. The ARNOLD benchmark for language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD

ARNOLD์€ ํ˜„์‹ค์ ์ธ 3D ์žฅ๋ฉด์—์„œ ์—ฐ์†์  ๊ฐ์ฒด ์ƒํƒœ๋ฅผ ์ดํ•ดํ•˜๊ณ  ์–ธ์–ด ๊ธฐ๋ฐ˜ ์กฐ์ž‘ ์ž‘์—…์„ ํ•™์Šตํ•˜๋Š” ๋กœ๋ด‡์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ์ด๋‹ค. 8๊ฐœ์˜ ์–ธ์–ด ์กฐ๊ฑด๋ถ€ ์ž‘์—…๊ณผ ์„ธ๋ฐ€ํ•œ ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜, ๋‹ค์–‘ํ•œ ์žฅ๋ฉด๊ณผ ๊ฐ์ฒด๋กœ ๊ตฌ์„ฑ๋˜์–ด ์žˆ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. The ARNOLD benchmark for language-grounded task learning with continuous states in realistic 3D scenes. ARNOLD

  1. ํฌ๊ด„์  ๋ฒค์น˜๋งˆํฌ: ํ˜„์‹ค์ ์ธ 3D ์ƒํ˜ธ์ž‘์šฉ ํ™˜๊ฒฝ์—์„œ ์—ฐ์†์  ๊ฐ์ฒด ์ƒํƒœ, ๋งˆ์ฐฐ ๊ธฐ๋ฐ˜ ๊ทธ๋ž˜ํ•‘, ๋‹ค์–‘ํ•œ ์žฅ๋ฉด ๋ฐฐ๊ฒฝ์„ ์ง€์›ํ•˜๋Š” ์ฒซ ๋ฒค์น˜๋งˆํฌ ์ œ์‹œ
  2. ์ฒด๊ณ„์  ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ: ์‹ ๊ทœ ๋ชฉํ‘œ ์ƒํƒœ(Novel State), ์‹ ๊ทœ ๊ฐ์ฒด(Novel Object), ์‹ ๊ทœ ์žฅ๋ฉด(Novel Scene)์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ตฌ๋ถ„ํ•˜์—ฌ ํ‰๊ฐ€
  3. ๊ธฐ์กด ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„ ๊ทœ๋ช…: ์ตœ์‹  ์–ธ์–ด ์กฐ๊ฑด๋ถ€ ์กฐ์ž‘ ๋ชจ๋ธ(language-conditioned policy learning models)์ด ์—ฌ์ „ํžˆ ์‹ ๊ทœ ์ƒํƒœ์™€ ์žฅ๋ฉด ์ผ๋ฐ˜ํ™”์—์„œ ํ˜„์ €ํ•œ ์–ด๋ ค์›€์„ ๊ฒช์Œ์„ ์ž…์ฆ
  4. ์‹ค์ฆ์  ๋ถ„์„: ์ƒํƒœ ๋ชจ๋ธ๋ง์˜ ์ค‘์š”์„ฑ์„ ํฌํ•จํ•œ ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜ ๋ถ„์„๊ณผ ์ œ๊ฑฐ ์—ฐ๊ตฌ(ablation studies)๋ฅผ ํ†ตํ•ด ํ–ฅํ›„ ์—ฐ๊ตฌ ๋ฐฉํ–ฅ ์ œ์‹œ

How

Figure 2

Figure 2. Multi-view robot observation in ARNOLD. The top row

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ARNOLD์€ ์–ธ์–ด ๊ธฐ๋ฐ˜ ๋กœ๋ด‡ ์ž‘์—… ํ•™์Šต์—์„œ ์—ฐ์†์  ๊ฐ์ฒด ์ƒํƒœ ์ดํ•ด์™€ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ํ‰๊ฐ€๋ผ๋Š” ์ค‘์š”ํ•œ ๊ณต๋ฐฑ์„ ์ฑ„์šฐ๋Š” ํฌ๊ด„์ ์ด๊ณ  ์ž˜ ์„ค๊ณ„๋œ ๋ฒค์น˜๋งˆํฌ์ด๋‹ค. ํ˜„์‹ค์  ๋ฌผ๋ฆฌ ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์ฒด๊ณ„์ ์ธ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ๊ธฐ์กด ๋ฐฉ๋ฒ•์˜ ํ•œ๊ณ„๋ฅผ ๋ช…ํ™•ํžˆ ๋“œ๋Ÿฌ๋‚ด๊ณ , ํ–ฅํ›„ ์—ฐ๊ตฌ์— ์‹ค์งˆ์ ์ธ ๊ธฐ์—ฌ๋ฅผ ํ•  ์ˆ˜ ์žˆ๋Š” ๊ฐ€์น˜ ์žˆ๋Š” ์ž์›์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •