RoboTron-Nav: A Unified Framework for Embodied Navigation Integrating Perception, Planning, and Prediction

์ €์ž: Yufeng Zhong, Chengjian Feng, Feng Yan, Fanfan Liu, Liming Zheng, Lin Ma | ๋‚ ์งœ: 2025-03-24 | URL: https://arxiv.org/abs/2503.18525 📄 PDF


Essence

Figure 3

Figure 3. Overview of RoboTron-Nav architecture. The current frame It is initially processed through 2D and 3D feature e

RoboTron-Nav๋Š” perception, planning, prediction์„ ํ†ตํ•ฉํ•˜๋Š” embodied navigation ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, multitask collaboration (navigation + EQA)๊ณผ adaptive 3D-aware history sampling์„ ํ†ตํ•ด ์–ธ์–ด ๊ธฐ๋ฐ˜ ์‹œ๊ฐ ๋„ค๋น„๊ฒŒ์ด์…˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚จ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2. Top: During long-term navigation, agents may revisit

How

Figure 3

Figure 3. Overview of RoboTron-Nav architecture. The current frame It is initially processed through 2D and 3D feature e

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: RoboTron-Nav๋Š” multitask collaboration๊ณผ adaptive history sampling์ด๋ผ๋Š” ๋‘ ๊ฐ€์ง€ ํ˜์‹ ์  ๊ตฌ์„ฑ์š”์†Œ๋ฅผ ํ†ตํ•ด embodied navigation์˜ ํ•ด์„๊ฐ€๋Šฅ์„ฑ๊ณผ ํšจ์œจ์„ฑ์„ ๋™์‹œ์— ๊ฐœ์„ ํ•˜๋ฉฐ, SOTA ์„ฑ๋Šฅ ๋‹ฌ์„ฑ์œผ๋กœ ์‹ค์šฉ์  ๊ฐ€์น˜๊ฐ€ ๋†’๋‹ค. ๋‹ค๋งŒ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ• ๋ฐฉ๋ฒ•๋ก ๊ณผ ์‹ค์‹œ๊ฐ„ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •