FLaRe: Achieving Masterful and Adaptive Robot Policies with Large-Scale Reinforcement Learning Fine-Tuning

์ €์ž: Jiaheng Hu, Rose Hendrix, Ali Farhadi, Aniruddha Kembhavi, Roberto Martin-Martin, Peter Stone, Kuo-Hao Zeng, Kiana Ehsani | ๋‚ ์งœ: 2024-09-25 | URL: https://arxiv.org/abs/2409.16578 📄 PDF


Essence

Figure 1

Fig. 1: FLaRe is a simple but effective approach for

FLaRe๋Š” ๋Œ€๊ทœ๋ชจ ๋‹ค์ค‘ ์ž‘์—… Behavior Cloning์œผ๋กœ ์‚ฌ์ „ํ•™์Šต๋œ ๋กœ๋ด‡ ์ •์ฑ…์„ Reinforcement Learning์œผ๋กœ ํšจ๊ณผ์ ์œผ๋กœ ๋ฏธ์„ธ์กฐ์ •ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ๊ทธ๋ž˜๋””์–ธํŠธ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•์„ ํ†ตํ•ด ์„ฑ๋Šฅ ์ •์ฒด๋ฅผ ๊ทน๋ณตํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Fig. 1: FLaRe is a simple but effective approach for

How

Figure 2

Fig. 2: FLaRe introduces a series of design choices that help stabilize the RL training process, including 1) fine-tunin

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: FLaRe๋Š” ๋Œ€๊ทœ๋ชจ ๋กœ๋ด‡ ์ •์ฑ… ๋ฏธ์„ธ์กฐ์ •์˜ ์‹ค์งˆ์  ๋ฌธ์ œ๋“ค์„ ๋ช…ํ™•ํžˆ ์ง„๋‹จํ•˜๊ณ  ์ฒด๊ณ„์ ์ธ ์„ค๊ณ„ ์„ ํƒ์œผ๋กœ ํ•ด๊ฒฐํ•˜์—ฌ, ์‹œ๋ฎฌ๋ ˆ์ด์…˜๊ณผ ์‹ค์ œ ๋กœ๋ด‡ ๋ชจ๋‘์—์„œ ํš๊ธฐ์ ์ธ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค. ํŠนํžˆ ๊ทธ๋ž˜๋””์–ธํŠธ ์•ˆ์ •ํ™” ๊ธฐ๋ฒ•๊ณผ ๋Œ€๊ทœ๋ชจ RL ํ›ˆ๋ จ์˜ ์„ฑ๊ณต์  ์ ์šฉ์€ ๋กœ๋ด‡ ๊ธฐ์ดˆ ๋ชจ๋ธ ๋ถ„์•ผ์˜ ์ค‘์š”ํ•œ ์ง„์ „์„ ๋‚˜ํƒ€๋‚ธ๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •