BFM-Zero: A Promptable Behavioral Foundation Model for Humanoid Control Using Unsupervised Reinforcement Learning

์ €์ž: Yitang Li, Zhengyi Luo, Tonghe Zhang, Cunxi Dai, Anssi Kanervisto, Andrea Tirinzoni, Haoyang Weng, Kris Kitani, Mateusz Guzek, Ahmed Touati, Alessandro Lazaric, Matteo Pirotta, Guanya Shi | ๋‚ ์งœ: 2025-11-06 | DOI: 10.48550/arXiv.2511.04131 📄 PDF


Essence

Figure 2

Figure 2: An overview of the BFM-Zero framework. After the pre-training stage, BFM-Zero forms a latent

BFM-Zero๋Š” unsupervised RL๊ณผ Forward-Backward ๋ชจ๋ธ์„ ํ™œ์šฉํ•˜์—ฌ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ๋‹ค์–‘ํ•œ ์ œ์–ด ์ž‘์—…์„ ๋‹จ์ผ ์ •์ฑ…์œผ๋กœ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋Š” promptable behavioral foundation model์„ ์ œ์‹œํ•œ๋‹ค. ๊ณต์œ  ์ž ์žฌ ๊ณต๊ฐ„์— ๋ชจ์…˜, ๋ชฉํ‘œ, ๋ณด์ƒ์„ ์ž„๋ฒ ๋”ฉํ•˜์—ฌ zero-shot ์ถ”๋ก ๊ณผ few-shot ์ ์‘์„ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: BFM-Zero enables versatile and robust whole-body skills. (A-C) Diverse zero-shot inference

How

Figure 2

Figure 2: An overview of the BFM-Zero framework. After the pre-training stage, BFM-Zero forms a latent

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: BFM-Zero๋Š” unsupervised RL์„ ํ†ตํ•ด ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์‹ค์ œ ๋ฐฐํฌ์—์„œ ์ฒ˜์Œ์œผ๋กœ promptable foundation model์„ ์„ฑ๊ณต์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜์˜€์œผ๋ฉฐ, zero-shot ๋‹ค์ค‘ ์ž‘์—… ์ˆ˜ํ–‰๊ณผ few-shot ์ ์‘์˜ ๊ท ํ˜•์„ ์ด๋ฃจ๋Š” ์‹ค์šฉ์  ์†”๋ฃจ์…˜์„ ์ œ์‹œํ•œ๋‹ค. ์ด๋Š” ๋กœ๋ด‡ ์ œ์–ด์˜ ํŒจ๋Ÿฌ๋‹ค์ž„ ์ „ํ™˜์„ ์ œ์‹œํ•˜๋Š” ์ค‘์š”ํ•œ ๊ธฐ์—ฌ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •