TokenHSI: Unified Synthesis of Physical Human-Scene Interactions through Task Tokenization

์ €์ž: Liang Pan, Zeshi Yang, Zhiyang Dou, Wenjia Wang, Buzhen Huang, Bo Dai, Taku Komura, Jingbo Wang | ๋‚ ์งœ: 2025-03-25 | URL: https://arxiv.org/abs/2503.19901 📄 PDF


Essence

Figure 1

Figure 1. Introducing TokenHSI, a unified model that enables physics-based characters to perform diverse human-scene int

TokenHSI๋Š” transformer ๊ธฐ๋ฐ˜์˜ ํ†ตํ•ฉ ์ •์ฑ…์œผ๋กœ humanoid ๊ณ ์œ ๊ฐ๊ฐ์„ ๊ณต์œ  ํ† ํฐ์œผ๋กœ ๋ชจ๋ธ๋งํ•˜๊ณ  task ํ† ํฐ๊ณผ masking mechanism์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ๋‹ค์–‘ํ•œ ์ธ๊ฐ„-์žฅ๋ฉด ์ƒํ˜ธ์ž‘์šฉ(HSI) ๊ธฐ์ˆ ์„ ๋‹จ์ผ ๋„คํŠธ์›Œํฌ์—์„œ ํ†ตํ•ฉํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. Introducing TokenHSI, a unified model that enables physics-based characters to perform diverse human-scene int

How

Figure 2

Figure 2. TokenHSI consists of two stages: (left) foundational skill learning and (right) policy adaptation. Through mul

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: TokenHSI๋Š” ๋…๋ฆฝ์  proprioception tokenizer์™€ masking mechanism์„ ํ†ตํ•ด ๋‹ค์ค‘ HSI ๊ธฐ์ˆ ์„ ๋‹จ์ผ ๋„คํŠธ์›Œํฌ์—์„œ ํšจ๊ณผ์ ์œผ๋กœ ํ†ตํ•ฉํ•˜๊ณ , ๋ณ€์ˆ˜ ๊ธธ์ด ์ž…๋ ฅ์„ ํ™œ์šฉํ•œ ํšจ์œจ์  ์ •์ฑ… ์ ์‘๊นŒ์ง€ ์‹คํ˜„ํ•œ ํ˜์‹ ์ ์ธ ์ ‘๊ทผ๋ฒ•์œผ๋กœ, ์ปดํ“จํ„ฐ ์• ๋‹ˆ๋ฉ”์ด์…˜๊ณผ embodied AI ๋ถ„์•ผ์—์„œ ์‹ค์งˆ์ ์ธ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •