Embodied Navigation Foundation Model

์ €์ž: Jiazhao Zhang, Anqi Li, Yunpeng Qi, Minghan Li, Jiahang Liu, Shaoan Wang, Haoran Liu, Gengze Zhou, Yuze Wu, Xingxing Li, Yuxin Fan, Wenjun Li, Zhibo Chen, Fei Gao, Qi Wu, Zhizheng Zhang, He Wang | ๋‚ ์งœ: 2025-09-15 | URL: https://arxiv.org/abs/2509.12129 📄 PDF


Essence

Figure 1

Figure 1: We provide an illustration of architecture (left) alongside real-world experiment results (right). The

NavFoM์€ 8๋ฐฑ๋งŒ ๊ฐœ์˜ ๋„ค๋น„๊ฒŒ์ด์…˜ ์ƒ˜ํ”Œ๋กœ ํ•™์Šต๋œ ํฌ๋กœ์Šค-๊ตฌํ˜„์ฒดยทํฌ๋กœ์Šค-ํƒœ์Šคํฌ ๊ธฐ๋ฐ˜ ๋„ค๋น„๊ฒŒ์ด์…˜ ๋ชจ๋ธ๋กœ, ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ”Œ๋žซํผ๊ณผ ๋„ค๋น„๊ฒŒ์ด์…˜ ์ž‘์—…์—์„œ ๋ฏธ์„ธ ์กฐ์ • ์—†์ด ์ตœ์ฒจ๋‹จ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Benchmark performance of NavFoM, we compare NavFoM with SOTA baselines on each bench-

How

Figure 3

Figure 3: Pipeline of NavFoM. Our method provides a unified framework for handling multiple tasks, includ-

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: NavFoM์€ ์‹ ์ฒดํ™”๋œ AI ๋ถ„์•ผ์—์„œ ํฌ๋กœ์Šค-๊ตฌํ˜„์ฒดยทํฌ๋กœ์Šค-ํƒœ์Šคํฌ ๋„ค๋น„๊ฒŒ์ด์…˜์„ ์ฒ˜์Œ์œผ๋กœ ํ†ตํ•ฉ์ ์œผ๋กœ ํ•ด๊ฒฐํ•œ ๋Œ€๊ทœ๋ชจ ๊ธฐ์ดˆ ๋ชจ๋ธ๋กœ, TVI ํ† ํฐ๊ณผ BATS ์ „๋žต์˜ ํ˜์‹ ์  ์„ค๊ณ„๋กœ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ํ”Œ๋žซํผ๊ณผ ๋„ค๋น„๊ฒŒ์ด์…˜ ์ž‘์—…์—์„œ ๋ฏธ์„ธ ์กฐ์ • ์—†์ด ๊ฐ•๋ ฅํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์ž…์ฆํ•˜์˜€๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •