Context-Aware Entity Grounding with Open-Vocabulary 3D Scene Graphs

์ €์ž: Haonan Chang, Kowndinya Boyalakuntla, Shiyang Lu, Siwei Cai, Eric Jing, Shreesh Keskar, Shijie Geng, Adeeb Abbas, Lifeng Zhou, Kostas Bekris, Abdeslam Boularias | ๋‚ ์งœ: 2023-09-27 | URL: https://arxiv.org/abs/2309.15940 📄 PDF


Essence

Figure 1

Figure 1: This is an illustration of the proposed pipeline. The system inputs are the positional input Pu, user input Lu

Open-Vocabulary 3D Scene Graph (OVSG)๋Š” ์ž์œ ํ˜•์‹ ํ…์ŠคํŠธ ์ฟผ๋ฆฌ๋ฅผ ํ†ตํ•ด ๊ฐ์ฒด, ์—์ด์ „ํŠธ, ์˜์—ญ ๋“ฑ ๋‹ค์–‘ํ•œ ์—”ํ‹ฐํ‹ฐ๋ฅผ ๋ฌธ๋งฅ ์ธ์‹์ ์œผ๋กœ localizeํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ๊ธฐ์กด์˜ ๊ณ ์ •๋œ ์‹œ๋งจํ‹ฑ ๋ ˆ์ด๋ธ” ๊ธฐ๋ฐ˜ ๋ฐฉ์‹๊ณผ ๋‹ฌ๋ฆฌ, ๋ฏธ๋ฆฌ ์ •์˜๋˜์ง€ ์•Š์€ ์นดํ…Œ๊ณ ๋ฆฌ์™€ ๊ด€๊ณ„๋„ ์ฒ˜๋ฆฌํ•  ์ˆ˜ ์žˆ๋‹ค.

Motivation

Achievement

Figure 5

Figure 5: Performance of OVSG w.r.t Grounding Success RateBB on ScanNet Scenes

How

Figure 1

Figure 1: This is an illustration of the proposed pipeline. The system inputs are the positional input Pu, user input Lu

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: OVSG๋Š” open-vocabulary ๋Šฅ๋ ฅ์„ 3D scene graph์— ํ†ตํ•ฉํ•˜์—ฌ ๋กœ๋ด‡์ด ์ž์—ฐ์Šค๋Ÿฌ์šด ๋ฌธ๋งฅ ๊ธฐ๋ฐ˜ ์ง€์‹œ๋ฅผ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋„๋ก ํ•œ ์˜๋ฏธ ์žˆ๋Š” ๊ธฐ์—ฌ์ด๋‹ค. ์‹ค์ œ ๋กœ๋ด‡ ์‹คํ—˜๊ณผ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹์„ ํ†ตํ•ด ์‹ค์šฉ์„ฑ์„ ์ž…์ฆํ–ˆ์œผ๋‚˜, scene reconstruction ์ •ํ™•๋„์™€ ํ™•์žฅ์„ฑ ์ธก๋ฉด์—์„œ ๊ฐœ์„ ์˜ ์—ฌ์ง€๊ฐ€ ์žˆ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •