Code-as-Monitor: Constraint-aware Visual Programming for Reactive and Proactive Robotic Failure Detection

์ €์ž: Enshen Zhou, Qi Su, Cheng Chi, Zhizheng Zhang, Zhongyuan Wang, Tiejun Huang, Lu Sheng, He Wang | ๋‚ ์งœ: 2024-12-05 | URL: https://arxiv.org/abs/2412.04455 📄 PDF


Essence

Figure 2

Figure 2. Overview of Code-as-Monitor. Given task instructions and prior information, the Constraint Generator derives t

VLM์„ ํ™œ์šฉํ•˜์—ฌ spatio-temporal constraint satisfaction ๋ฌธ์ œ๋กœ ๋กœ๋ด‡ ์‹คํŒจ๋ฅผ ์ •์‹ํ™”ํ•˜๊ณ , constraint elements๋ฅผ ์ถ”์ƒํ™”ํ•˜์—ฌ VLM ์ƒ์„ฑ ์ฝ”๋“œ๋กœ ์‹ค์‹œ๊ฐ„ ๋ชจ๋‹ˆํ„ฐ๋งํ•˜๋Š” Code-as-Monitor(CaM) ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. For the task โ€œMove the pan with lobster to the stove without losing the lobsterโ€, (a) reactive failure detecti

How

Figure 2

Figure 2. Overview of Code-as-Monitor. Given task instructions and prior information, the Constraint Generator derives t

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ open-set ๋ฐ˜์‘์ /์˜ˆ๋ฐฉ์  ์‹คํŒจ ๊ฐ์ง€๋ฅผ ์ฒ˜์Œ์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” Code-as-Monitor ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ œ์•ˆํ•˜๋ฉฐ, constraint elements๋ผ๋Š” ์ฐฝ์˜์  ์ถ”์ƒํ™”๋กœ VLM์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ๊ณผ ์‹ค์‹œ๊ฐ„ ํšจ์œจ์„ฑ์˜ ์ƒ์ถฉ์„ ํ•ด๊ฒฐํ•œ ์šฐ์ˆ˜ํ•œ ๊ธฐ์—ฌ์ด๋‹ค. ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ๊ณผ ๋กœ๋ด‡ ํ”Œ๋žซํผ์—์„œ์˜ ๊ด‘๋ฒ”์œ„ํ•œ ๊ฒ€์ฆ๊ณผ ๋ช…ํ™•ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ ์„ค๊ณ„๋กœ ๋†’์€ ๊ฐ€์น˜๋ฅผ ์ง€๋‹Œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •