AutoEval: Autonomous Evaluation of Generalist Robot Manipulation Policies in the Real World

์ €์ž: Zhiyuan Zhou, Pranav Atreya, You Liang Tan, Karl Pertsch, Sergey Levine | ๋‚ ์งœ: 2025-03-31 | URL: https://arxiv.org/abs/2503.24278 📄 PDF


Essence

Figure 1

Figure 1: We introduce AutoEval, a system for scalable, automated real robot evaluation of generalist robot policies.

AutoEval์€ ๋Œ€๊ทœ๋ชจ ๋กœ๋ด‡ ์ •์ฑ… ํ‰๊ฐ€์˜ ๋ณ‘๋ชฉ์„ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ž๋™ํ™”๋œ ์„ฑ๊ณต ๊ฐ์ง€์™€ ์žฅ๋ฉด ๋ฆฌ์…‹ ๊ธฐ๋Šฅ์„ ๊ฐ–์ถ˜ ์‹ค์„ธ๊ณ„ ์ž์œจ ํ‰๊ฐ€ ์‹œ์Šคํ…œ์œผ๋กœ, ์ธ๊ฐ„ ๊ฐœ์ž…์„ 99% ์ด์ƒ ๊ฐ์†Œ์‹œํ‚ค๋ฉด์„œ 24์‹œ๊ฐ„ ์—ฐ์† ํ‰๊ฐ€๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: We introduce AutoEval, a system for scalable, automated real robot evaluation of generalist robot policies.

How

Figure 2

Figure 2: Bridge-AutoEval cell: our robot setup for au-

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: AutoEval์€ generalist ๋กœ๋ด‡ ์ •์ฑ… ํ‰๊ฐ€์˜ ์‹ฌ๊ฐํ•œ ํ™•์žฅ์„ฑ ๋ฌธ์ œ๋ฅผ ์‹ค์งˆ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ํ˜์‹ ์ ์ธ ์‹œ์Šคํ…œ์œผ๋กœ, ์ž๋™ํ™”๋œ ๋ฆฌ์…‹๊ณผ ์„ฑ๊ณต ๊ฐ์ง€๋ฅผ ํ†ตํ•ด ์ธ๊ฐ„ ๊ฐœ์ž…์„ ๊ทน์ ์œผ๋กœ ์ค„์ด๋ฉด์„œ๋„ ์‹ ๋ขฐํ•  ์ˆ˜ ์žˆ๋Š” ๊ฒฐ๊ณผ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๊ณต๊ฐœ ๋ฒค์น˜๋งˆํ‚น ํ”Œ๋žซํผ ์ œ๊ณต์œผ๋กœ ๋กœ๋ด‡ ํ•™์Šต ์ปค๋ฎค๋‹ˆํ‹ฐ์— ์ค‘๋Œ€ํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •