CLIP-Fields: Weakly Supervised Semantic Fields for Robotic Memory

์ €์ž: Nur Muhammad Mahi Shafiullah, Chris Paxton, Lerrel Pinto, Soumith Chintala, Arthur Szlam | ๋‚ ์งœ: 2022-10-11 | URL: https://arxiv.org/abs/2210.05663 📄 PDF


Essence

Figure 1

Fig. 1: Our approach, CLIP-Fields, integrates multiple views of a

CLIP-Fields๋Š” ๊ณต๊ฐ„ ์ขŒํ‘œ๋ฅผ CLIP, Detic, Sentence-BERT ๋“ฑ ์›น ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์˜ ์˜๋ฏธ๋ก ์  ์ž„๋ฒ ๋”ฉ์œผ๋กœ ๋งคํ•‘ํ•˜๋Š” ์•”๋ฌต์  ์‹ ๊ฒฝ ํ•„๋“œ๋กœ, ์ง์ ‘ ์ธ๊ฐ„ ๊ฐ๋… ์—†์ด ๋กœ๋ด‡์˜ 3D ์˜๋ฏธ๋ก ์  ๋ฉ”๋ชจ๋ฆฌ๋กœ ์ž‘๋™ํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Fig. 4: Mean average precision in instance segmentation on the

How

Figure 2

Fig. 2: Dataset creation process for CLIP-Fields by processing

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: CLIP-Fields๋Š” ์›น ์‚ฌ์ „ํ•™์Šต ๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ์•ฝํ•œ ๊ฐ๋… ํ•™์Šต์œผ๋กœ ์ธ๊ฐ„ ์ฃผ์„์„ ์™„์ „ํžˆ ์ œ๊ฑฐํ•˜๋ฉด์„œ๋„ ๊ฐœ๋ฐฉ ์–ดํœ˜ ๊ธฐ๋ฐ˜ 3D ์˜๋ฏธ๋ก ์  ๋ฉ”๋ชจ๋ฆฌ๋ฅผ ๊ตฌ์ถ•ํ•˜๋Š” ํ˜์‹ ์  ์ ‘๊ทผ๋ฒ•์ด๋‹ค. ๋กœ๋ด‡ ์‘์šฉ์˜ ์‹ค์šฉ์„ฑ๊ณผ ์ ์€ ๋ฐ์ดํ„ฐ๋กœ๋„ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์—ฌ์ฃผ๋Š” ์ ์—์„œ ๋งค์šฐ ์ค‘์š”ํ•œ ๊ธฐ์—ฌ์ด๋‚˜, ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ์˜ ๋Œ€๊ทœ๋ชจ ํ‰๊ฐ€ ๋ฐ ๋™์  ์žฅ๋ฉด ์ฒ˜๋ฆฌ๋Š” ํ–ฅํ›„ ๊ณผ์ œ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •