SimGenHOI: Physically Realistic Whole-Body Humanoid-Object Interaction via Generative Modeling and Reinforcement Learning

์ €์ž: Yuhang Lin, Yijia Xie, Jiahong Xie, Yuehao Huang, Ruoyu Wang, Jiajun Lv, Yukai Ma, Xingxing Zuo | ๋‚ ์งœ: 2025-08-18 | URL: https://arxiv.org/abs/2508.14120 📄 PDF


Essence

Figure 2

Figure 2: Our proposed framework uses a diffusion model for key action generation and reinforcement learning to train

SimGenHOI๋Š” Diffusion Transformers ๊ธฐ๋ฐ˜์˜ ์ƒ์„ฑ ๋ชจ๋ธ๊ณผ ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜์˜ ์ ‘์ด‰-์ธ์‹ ์ œ์–ด ์ •์ฑ…์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํ˜„์‹ค์ ์ธ ์ธ๊ฐ„ํ˜• ๋กœ๋ด‡-๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ์„ ์ƒ์„ฑํ•˜๋Š” ํ†ตํ•ฉ ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ์ƒํ˜ธ ๋ฏธ์„ธ์กฐ์ • ์ „๋žต์„ ํ†ตํ•ด ์ƒ์„ฑ ๋ชจ๋ธ๊ณผ ์ œ์–ด ์ •์ฑ…์ด ๋ฐ˜๋ณต์ ์œผ๋กœ ์„œ๋กœ๋ฅผ ๊ฐœ์„ ํ•˜์—ฌ ์žฅ๊ธฐ ์กฐ์ž‘ ๊ณผ์ œ์˜ ์„ฑ๊ณต๋ฅ ์„ ๋†’์ธ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: With the condition of text prompt, object geometry,

How

Figure 2

Figure 2: Our proposed framework uses a diffusion model for key action generation and reinforcement learning to train

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ์ƒ์„ฑ ๋ชจ๋ธ๊ณผ ๊ฐ•ํ™”ํ•™์Šต์˜ ์ƒํ˜ธ ๋ณด์™„์  ๊ฐ•์ ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๋ฌผ๋ฆฌ์ ์œผ๋กœ ํ˜„์‹ค์ ์ธ ์žฅ๊ธฐ ์ธ๊ฐ„ํ˜• ๋กœ๋ด‡-๊ฐ์ฒด ์ƒํ˜ธ์ž‘์šฉ ์ƒ์„ฑ์ด๋ผ๋Š” ์ค‘์š”ํ•œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์˜€๋‹ค. ํŠนํžˆ ์ƒํ˜ธ ๋ฏธ์„ธ์กฐ์ • ์ „๋žต๊ณผ key action ๊ธฐ๋ฐ˜ ํŒจ๋Ÿฌ๋‹ค์ž„์€ ๋†’์€ ๋…์ฐฝ์„ฑ์„ ๋ณด์—ฌ์ฃผ๋ฉฐ, ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์„ ํ†ตํ•ด ๋ฐฉ๋ฒ•์˜ ํšจ๊ณผ๋ฅผ ์ž…์ฆํ–ˆ์œผ๋‚˜ sim-to-real ๊ฒ€์ฆ์ด ๋ถ€์กฑํ•œ ์ ์ด ์•„์‰ฝ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •