IPR-1: Interactive Physical Reasoner

์ €์ž: Mingyu Zhang, Lifeng Zhuo, Tianxi Tan, Guocan Xie, Xian Nie, Yan Li, Renjie Zhao, Zizhu He, Ziyu Wang, Jiting Cai, Yong-Lu Li | ๋‚ ์งœ: 2025-11-19 | URL: https://arxiv.org/abs/2511.15407 📄 PDF


Essence

Figure 5

Figure 5. IPR training pipeline. Stage 1: PhysCode pre-training. Video clips with optical flow and action semantics are

Interactive Physical Reasoner (IPR)๋Š” VLM์˜ ์ •์ฑ…์„ world model์˜ ๋กค์•„์›ƒ์œผ๋กœ ๊ฐ•ํ™”ํ•˜์—ฌ ์ƒํ˜ธ์ž‘์šฉ์„ ํ†ตํ•ด ๋ฌผ๋ฆฌ ์ถ”๋ก  ๋Šฅ๋ ฅ์„ ํ•™์Šตํ•˜๋Š” ์—์ด์ „ํŠธ์ด๋‹ค. PhysCode๋ผ๋Š” ๋ฌผ๋ฆฌ ์ค‘์‹ฌ ์•ก์…˜ ์ฝ”๋“œ๋ฅผ ๋„์ž…ํ•˜์—ฌ ์˜๋ฏธ๋ก ์  ์˜๋„์™€ ์—ญํ•™์„ ์ •๋ ฌํ•˜๊ณ , 1,000+ ๊ฒŒ์ž„์œผ๋กœ ์‚ฌ์ „ํ•™์Šต๋˜์–ด ๋ฌผ๋ฆฌ ์ง๊ด€๋ถ€ํ„ฐ ๋ชฉํ‘œ ์ง€ํ–ฅ ์ถ”๋ก ๊นŒ์ง€ ๊ฒฌ๊ณ ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2. Three-level evaluation inspired by Maslowโ€™s hierarchy of needs. We organize tasks into a pyramid of Survival,

How

Figure 5

Figure 5. IPR training pipeline. Stage 1: PhysCode pre-training. Video clips with optical flow and action semantics are

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: IPR์€ VLM๊ณผ world model์„ ๋ฌผ๋ฆฌ ์ค‘์‹ฌ์˜ ์•ก์…˜ ๊ณต๊ฐ„์œผ๋กœ ํ†ตํ•ฉํ•˜๋Š” ํ˜์‹ ์  ์ ‘๊ทผ์„ ์ œ์‹œํ•˜๋ฉฐ, ๋Œ€๊ทœ๋ชจ ์ด์งˆ์  ๊ฒŒ์ž„ ๋ฒค์น˜๋งˆํฌ์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ๊ณผ ์ „์ด ๋Šฅ๋ ฅ์„ ๋ณด์˜€๋‹ค. ์ƒํ˜ธ์ž‘์šฉ ๊ธฐ๋ฐ˜ ๋ฌผ๋ฆฌ ์ถ”๋ก ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ์ž…์ฆํ–ˆ์œผ๋‚˜, ์‹ค์ œ ๋กœ๋ด‡๊ณตํ•™ ํ™˜๊ฒฝ์œผ๋กœ์˜ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ๊ณผ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ๊ฒ€์ฆ์ด ํ•„์š”ํ•˜๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •