Robustness evaluation of offline reinforcement learning for robot control against action perturbations

์ €์ž: Yogesh K. Dwivedi, Nir Kshetri, Laurie Hughes, Emma Slade, Anand Jeyaraj, Arpan Kumar Kar, Abdullah M. Baabdullah, Alex Koohang, Vishnupriya Raghavan, Manju Ahuja, Hanaa Albanna, Mousa Ahmad Albashrawi, Adil S. Al-Busaidi, Janarthanan Balakrishnan, Yves Barlette, Sriparna Basu, Indranil Bose, Laurence Brooks, Dimitrios Buhalis, Lemuria Carter | ๋‚ ์งœ: 2024 | URL: https://arxiv.org/abs/2412.18781 📄 PDF


Essence

Figure 3

Figure 3: Testing-time robustness evaluation results under varying adversarial perturbation strengths in three legged

๋ณธ ๋…ผ๋ฌธ์€ ์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต(Offline RL) ๋ฐฉ๋ฒ•๋“ค์˜ ํ–‰๋™ ์„ญ๋™(action perturbation)์— ๋Œ€ํ•œ ๊ฒฌ๊ณ ์„ฑ์„ ํ‰๊ฐ€ํ•˜๋ฉฐ, ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ์˜จ๋ผ์ธ RL๋ณด๋‹ค ๋” ์ทจ์•ฝํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

Motivation

Achievement

Figure 3

Figure 3: Testing-time robustness evaluation results under varying adversarial perturbation strengths in three legged

How

Figure 1

Figure 1: Overview of the robustness evaluation for offline RL. Offline RL models are trained on varying-quality offline

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ์˜คํ”„๋ผ์ธ RL์˜ ์‹ค์ œ ์‘์šฉ์— ์ค‘์š”ํ•œ ํ–‰๋™ ์„ญ๋™ ๊ฒฌ๊ณ ์„ฑ์„ ์ฒ˜์Œ ๋‹ค๋ฃจ์—ˆ์œผ๋ฉฐ, ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์˜ ์ทจ์•ฝ์„ฑ์„ ๋ช…ํ™•ํžˆ ์ž…์ฆํ–ˆ๋‹ค. ๋‹ค๋งŒ ํ•ด๊ฒฐ์ฑ… ์ œ์‹œ ๋ถ€์กฑ๊ณผ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ํ™˜๊ฒฝ ์ œํ•œ์ด ์•„์‰ฌ์šฐ๋‚˜, ํ–ฅํ›„ ๊ฒฌ๊ณ ํ•œ ์˜คํ”„๋ผ์ธ RL ๊ฐœ๋ฐœ์„ ์œ„ํ•œ ์ค‘์š”ํ•œ ๋ฒค์น˜๋งˆํฌ ์—ฐ๊ตฌ๋กœ ๊ฐ€์น˜๊ฐ€ ๋†’๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
859๋Š” ์–ธ์–ด๋ชจ๋ธ ๊ธฐ๋ฐ˜์˜ ์‚ฌ์‹ค ๊ฒ€์ฆ๊ณผ ๋‚ด์žฌ์  ์‹ ๋ขฐ์„ฑ์„ ๋‹ค๋ฃจ์–ด, RL์˜ ํ–‰๋™ ๋ฐ ๋ณด์ƒ ํ‰๊ฐ€์— ๊ด€ํ•œ ์‹ ๋ขฐ์„ฑ ๋…ผ์˜์— ์ด๋ก ์  ํ† ๋Œ€๋ฅผ ์ œ์‹œํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Evaluation of openai o1 ๋…ผ๋ฌธ์€ AI ์—์ด์ „ํŠธ ํ‰๊ฐ€์˜ ์‹ค๋ฌด์  ํ•œ๊ณ„์™€ ๋ฐฉ๋ฒ•๋ก ์„ ๋…ผ์˜ํ•˜๋ฉฐ, 688์˜ ์˜คํ”„๋ผ์ธ RL ๊ฒฌ๊ณ ์„ฑ ๋ถ„์„์—๋„ ์ ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
688์€ ์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต์˜ ์ผ๋ฐ˜ํ™” ํ‰๊ฐ€์™€ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ๋‚ด ๊ฐ•์ธ์„ฑ ์‹คํ—˜์„ ๋‹ค๋ค„์„œ, 422์˜ sharpness-aware minimization ์ ์šฉ ์‹œ ์‹คํ—˜์  ์ฐธ์กฐ๊ฐ€ ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
449๋Š” ๋Œ€ํ˜• ์–ธ์–ด๋ชจ๋ธ๊ณผ RL ์œตํ•ฉ ํ™•์žฅ ์ „๋žต์„ ์ œ์‹œํ•˜๋ฉฐ RL์˜ ์ผ๋ฐ˜ํ™” ๋ฐ ์ทจ์•ฝ์„ฑ ๋ฌธ์ œ์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์˜คํ”„๋ผ์ธ ๊ฐ•ํ™”ํ•™์Šต์˜ ๊ฒฌ๊ณ ์„ฑ ํ‰๊ฐ€์™€ ๋‹ค์–‘ํ•œ ํ™˜๊ฒฝ ์ ์šฉ ์‚ฌ๋ก€๊ฐ€ ๋ณธ ๋…ผ๋ฌธ์˜ ๋กœ๋ด‡ ์ œ์–ด ๋™์‹œํ•™์Šต์˜ ์‹คํ—˜์  ๊ทผ๊ฑฐ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์˜คํ”„๋ผ์ธ RL์˜ ๋กœ๋ด‡ ์ œ์–ด ๊ฒฌ๊ณ ์„ฑ ํ‰๊ฐ€๋กœ, ๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ๋งˆ์ดํฌ๋กœํ”Œ๋ฃจ์ด๋”• ์‹คํ—˜ ์ œ์–ด์˜ ํ•œ๊ณ„ ๋ฐ ์•ˆ์ „์„ฑ ๋ฌธ์ œ์™€ ์—ฐ๊ฒฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
891์€ RL ์ •์ฑ…์˜ sim-to-real zero-shot ์ „์ด ๋ฌธ์ œ์—์„œ ๊ฒฌ๊ณ ์„ฑ์„ ๋‹ค๋ฅด๊ฒŒ ํ‰๊ฐ€ํ•˜์—ฌ, 688์˜ ์˜คํ”„๋ผ์ธ RL ๊ฒฌ๊ณ ์„ฑ ์—ฐ๊ตฌ์™€ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Robustness evaluation of offline reinforcement learning for science ๋…ผ๋ฌธ์€ RL์˜ ์•ˆ์ „/๊ฒฌ๊ณ ์„ฑ ๋ฌธ์ œ๋ฅผ CBF ์ ‘๊ทผ ์ด์™ธ์— ์‹คํ—˜ ๊ธฐ๋ฐ˜ ํ‰๊ฐ€๋กœ ๋‹ค๋ฃจ์–ด, RL ์•ˆ์ „์„ฑ์˜ ๋Œ€์•ˆ์  ๋…ผ์˜๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๋งˆ์ดํฌ๋กœํ”Œ๋ฃจ์ด๋”• ์ œ์–ด ์‹ค์ œ ์‹คํ—˜ ํ˜„์žฅ์—์„œ RL ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ๊ฒฌ๊ณ ์„ฑ์„ ์ง์ ‘ ์‹คํ—˜์ ์œผ๋กœ ๋ถ„์„ํ•œ ๋…ผ๋ฌธ์œผ๋กœ, ์‹ค์งˆ์  ํ›„์† ์—ฐ๊ตฌ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Iterative Distillation for Reward-Guided Fine-Tuning of Diffusion Models ๋…ผ๋ฌธ์€ ๋ณด์ƒ ๊ธฐ๋ฐ˜ ํŠœ๋‹๊ณผ ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™” ์ ‘๊ทผ์„ ํ†ตํ•œ RL ๋ชจ๋ธ ๊ฐœ์„  ๋ฐฉ๋ฒ•์„ ์ถ”๊ฐ€๋กœ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
456์€ ์ž์—ฐ์–ด์—์„œ ๋ฌผ๋ฆฌ ์ œ์•ฝ ๊ฐ•ํ™” ์‹ ๊ฒฝ๋ง์„ ์ƒ์„ฑํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ RL ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ๊ฒฌ๊ณ ์„ฑ์„ ์ƒˆ๋กœ์šด ํ˜•ํƒœ๋กœ ๋ฐœ์ „์‹œํ‚จ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
868์—์„œ AI ๊ธฐ๋ฐ˜ ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ ์‹คํ—˜ ์ž๋™ํ™”์˜ ์‚ฌ๋ก€๋ฅผ ๋‹ค๋ฃจ๋ฏ€๋กœ, RL ๊ธฐ๋ฐ˜ ์ œ์–ด์‹œ์Šคํ…œ์˜ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ๊ณผ ์—ฐ๊ฒฐ๋˜์–ด ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •