NORA-1.5: A Vision-Language-Action Model Trained using World Model- and Action-based Preference Rewards

์ €์ž: Chia-Yu Hung, Navonil Majumder, Haoyuan Deng, Liu Renhang, Yankang Ang, Amir Zadeh, Chuan Li, Dorien Herremans, Ziwei Wang, Soujanya Poria | ๋‚ ์งœ: 2025-11-18 | URL: https://arxiv.org/abs/2511.14659 📄 PDF


Essence

Figure 1

Figure 1. Training pipeline of NORA-1.5 where firstly a VLA model is pre-trained through imitation learning and subseque

NORA-1.5๋Š” flow-matching ๊ธฐ๋ฐ˜ action expert๋ฅผ ์ถ”๊ฐ€ํ•˜์—ฌ VLA ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ , world model ๋ฐ action-based reward๋ฅผ ์ด์šฉํ•œ DPO ๊ธฐ๋ฐ˜ post-training์œผ๋กœ ์‹ค์ œ ๋กœ๋ด‡ ํ™˜๊ฒฝ์—์„œ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ๊ฐœ์„ ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. Training pipeline of NORA-1.5 where firstly a VLA model is pre-trained through imitation learning and subseque

How

Figure 1

Figure 1. Training pipeline of NORA-1.5 where firstly a VLA model is pre-trained through imitation learning and subseque

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: NORA-1.5๋Š” flow-matching ๊ธฐ๋ฐ˜ ์•„ํ‚คํ…์ฒ˜ ๊ฐœ์„ ๊ณผ ๊ฒฝ๋Ÿ‰์ด๋ฉด์„œ๋„ ํšจ๊ณผ์ ์ธ reward ๊ธฐ๋ฐ˜ post-training์„ ๊ฒฐํ•ฉํ•˜์—ฌ VLA ๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์‹ค์ œ ๋ฐฐํฌ ๊ฐ€๋Šฅ์„ฑ์„ ํฌ๊ฒŒ ํ–ฅ์ƒ์‹œํ‚จ ์˜๋ฏธ ์žˆ๋Š” ์—ฐ๊ตฌ์ด๋‹ค. ๊ด‘๋ฒ”์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ์—์„œ์˜ ์„ฑ๊ณผ์™€ ํ™•์žฅ ๊ฐ€๋Šฅํ•œ post-training ๋ฐฉ๋ฒ•๋ก ์€ embodied AI ๋ถ„์•ผ์— ์‹ค์งˆ์ ์ธ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •