Guided by guardrails: Control barrier functions as safety instructors for robotic learning

์ €์ž: Maeva Guerrier, Karthik Soma, Hassan Fouad, Giovanni Beltrame | ๋‚ ์งœ: 2025 | DOI: arXiv:2505.18858 📄 PDF


Essence

๊ฐ•ํ™”ํ•™์Šต(RL)์˜ ์•ˆ์ „์„ฑ ๋ฌธ์ œ๋ฅผ ์ œ์–ด ์žฅ๋ฒฝ ํ•จ์ˆ˜(Control Barrier Functions, CBFs)๋ฅผ ํ™œ์šฉํ•˜์—ฌ ํ•ด๊ฒฐํ•˜๋Š” ํ˜์‹ ์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค. ์„ธ ๊ฐ€์ง€ CBF ํ†ตํ•ฉ ๋ฐฉ์‹์„ ํ†ตํ•ด ๋กœ๋ด‡์ด ์•ˆ์ „ํ•œ ํ–‰๋™์„ ํ•™์Šตํ•˜๋ฉด์„œ๋„ ๋ชฉํ‘œ ๋‹ฌ์„ฑ ์„ฑ๋Šฅ์„ ์œ ์ง€ํ•˜๋„๋ก ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

๊ทธ๋ฆผ 1: ์„ธ ๊ฐ€์ง€ ์•ˆ์ „ ๊ฐ€๋“œ๋ ˆ์ผ ๋ณ€ํ˜• - ํ•„ํ„ฐ(์ดˆ๋ก์ƒ‰), ๋ณด์ƒ ๊ธฐ๋ฐ˜(์ฃผํ™ฉ์ƒ‰), ๊ฐ์‡ (ํŒŒ๋ž€์ƒ‰)

  1. ์„ธ ๊ฐ€์ง€ CBF-RL ํ†ตํ•ฉ ๋ฐฉ์‹ ์ œ์•ˆ:
    • CBF Filter: ์—์ด์ „ํŠธ๊ฐ€ ์œ„ํ—˜ ์˜์—ญ์— ์ง„์ž… ์‹œ ์•ก์…˜์„ ์ตœ์†Œํ•œ์œผ๋กœ ๊ฐœ์ž…ํ•˜์—ฌ ๊ต์ •
    • CBF Reward: CBF ์ œ์•ˆ ์•ก์…˜์œผ๋กœ๋ถ€ํ„ฐ์˜ ํŽธ์ฐจ๋ฅผ ๋ณด์ƒ ํ•จ์ˆ˜์— ํฌํ•จ์‹œ์ผœ ํŽ˜๋„ํ‹ฐ ๋ถ€์—ฌ
    • CBF Decay: ์ปค๋ฆฌํ˜๋Ÿผ ํ•™์Šต ๋ฐฉ์‹์œผ๋กœ ํ›ˆ๋ จ ๊ณผ์ •์—์„œ CBF์˜ ์˜ํ–ฅ์„ ์ ์ง„์ ์œผ๋กœ ์ œ๊ฑฐ
  2. ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ ์ž…์ฆ:
    • ๋‹จ์ˆœ ์œ ๋‹ˆ์‚ฌ์ดํด(unicycle) ๋ชจ๋ธ๋กœ ์ถ”์ƒํ™”ํ•˜์—ฌ ๋‹ค์–‘ํ•œ ๋กœ๋ด‡ ๋™์—ญํ•™์— ์ ์šฉ ๊ฐ€๋Šฅ
    • ์‹œ๋ฎฌ๋ ˆ์ด์…˜์—์„œ ํ›ˆ๋ จํ•œ ์ •์ฑ…์„ 4๋ฅœ ์ฐจ๋™ ๊ตฌ๋™ ๋กœ๋ด‡(four-wheel differential drive robot)์— ์„ฑ๊ณต์ ์œผ๋กœ ๋ฐฐํฌ
    • ์‹œ๋ฎฌ๋ ˆ์ด์…˜-ํ˜„์‹ค ์ด์ „(sim2real transfer) ์„ฑ๋Šฅ ํ‰๊ฐ€

How

Figure 2

๊ทธ๋ฆผ 2: ์œ ๋‹ˆ์‚ฌ์ดํด ๋ชจ๋ธ์˜ ์žฅ์• ๋ฌผ ํšŒํ”ผ CBF ๊ตฌ์„ฑ - ๋กœ๋ด‡ ์ถ•์„ ๋”ฐ๋ผ ฮต๋งŒํผ ์ด๋™ํ•œ ์  x'๋ฅผ ์‚ฌ์šฉ

๊ธฐ์ˆ ์  ๊ตฌํ˜„:

Originality

Limitation & Further Study

ํ•œ๊ณ„์ :

ํ›„์† ์—ฐ๊ตฌ ๋ฐฉํ–ฅ:

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ด ๋…ผ๋ฌธ์€ ๊ฐ•ํ™”ํ•™์Šต์˜ ์•ˆ์ „์„ฑ ๋ฌธ์ œ๋ฅผ CBF๋ผ๋Š” ์ด๋ก ์ ์œผ๋กœ ๊ฒฌ๊ณ ํ•œ ๋„๊ตฌ๋ฅผ ํ†ตํ•ด ํ•ด๊ฒฐํ•˜๋Š” ์‹ค์งˆ์ ์ด๊ณ  ์ฐฝ์˜์ ์ธ ์ ‘๊ทผ์„ ์ œ์‹œํ•˜๋ฉฐ, ์„ธ ๊ฐ€์ง€ ํ†ตํ•ฉ ๋ฐฉ์‹์˜ ๋น„๊ต์™€ sim2real ๊ฒ€์ฆ์„ ํ†ตํ•ด ์‹ค๋ฌด์  ๊ฐ€์น˜๋ฅผ ์ž…์ฆํ•œ๋‹ค. ๋‹ค๋งŒ ๋” ๋ณต์žกํ•œ ํ™˜๊ฒฝ๊ณผ ๋™์  ์žฅ์• ๋ฌผ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ ํ‰๊ฐ€๊ฐ€ ํ›„์† ๊ณผ์ œ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Guided by guardrails ๋…ผ๋ฌธ์€ SafeScientist๊ฐ€ ์ œ์•ˆํ•˜๋Š” ์•ˆ์ „ ๋ฐ ์œค๋ฆฌ์  ๋ฐฉ์–ด ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ์ปจํŠธ๋กค ๋ฐฉ๋ฒ•์„ ์ƒ์„ธํ•˜๊ฒŒ ๋…ผ์˜ํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
โ€˜Draft, sketch, and proveโ€™ ๋…ผ๋ฌธ์€ ์ž๋™์ •๋ฆฌ ์ฆ๋ช… ๋‹จ๊ณ„์—์„œ ์•ˆ์ „์„ฑ ๋ฐ ์ œ์•ฝ ์กฐ๊ฑด ๊ธฐ๋ฐ˜ ํƒ์ƒ‰์„ ๋‹ค๋ฃจ์–ด RL ์•ˆ์ „์„ฑ ์—ฐ๊ตฌ์™€ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
400๋ฒˆ ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์˜ ์•ˆ์ „์„ฑ๊ณผ ํšจ์œจ์„ ๋ฉ”๋ชจ๋ฆฌยท๋ชฉํ‘œ ๊ด€๋ฆฌ ์ „๋žต์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐ˜๋ฉด, 395๋ฒˆ์€ ๊ฐ•ํ™”ํ•™์Šต ์•ˆ์ „์„ฑ์— ์ œ์–ด ์ด๋ก ์„ ๋„์ž…ํ•˜๋ฏ€๋กœ, ๋‘ ๋…ผ๋ฌธ์€ ๋‹ค๊ฐ๋„์˜ ์•ˆ์ „์„ฑ ๊ฐ•ํ™” ์ ‘๊ทผ์„ ๋น„๊ต ์—ฐ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Robustness evaluation of offline reinforcement learning for science ๋…ผ๋ฌธ์€ RL์˜ ์•ˆ์ „/๊ฒฌ๊ณ ์„ฑ ๋ฌธ์ œ๋ฅผ CBF ์ ‘๊ทผ ์ด์™ธ์— ์‹คํ—˜ ๊ธฐ๋ฐ˜ ํ‰๊ฐ€๋กœ ๋‹ค๋ฃจ์–ด, RL ์•ˆ์ „์„ฑ์˜ ๋Œ€์•ˆ์  ๋…ผ์˜๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
422 ๋…ผ๋ฌธ์€ ๊ฐ•ํ™”ํ•™์Šต์˜ ์•ˆ์ •์„ฑยท์•ˆ์ „์„ฑ ํ™•๋ณด๋ผ๋Š” ๋™์ผํ•œ ๋ฌธ์ œ๋ฅผ ํ‰ํ‰ํ•œ ์†์‹ค ์ตœ์†Œํ™” ๊ด€์ ์—์„œ ์ ‘๊ทผํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
SafeScientist ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ๊ณผํ•™ ์‹คํ—˜์˜ ์œ„ํ—˜ ์ธ์‹ยท์™„ํ™” ํ”„๋กœํ† ์ฝœ์„ ๋‹ค๋ค„, ๊ฐ•ํ™”ํ•™์Šต ์•ˆ์ „์„ฑ๊ณผ ๋น„๊ต ๊ฐ€๋Šฅํ•œ ๋Œ€์•ˆ์  ์ ‘๊ทผ๋ฒ•์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
TrustLLM ๋…ผ๋ฌธ์€ LLM/RL ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์˜ ์‹ ๋ขฐ์„ฑ, ์•ˆ์ „์„ฑ ํ‰๊ฐ€๋ฅผ ๋‹ค๋ฃจ๋ฏ€๋กœ, ๊ฐ•ํ™”ํ•™์Šต์˜ ์•ˆ์ „ ๊ณ ๋ ค ์ธก๋ฉด์„ ํญ๋„“๊ฒŒ ๊ณ ์ฐฐํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
845๋ฒˆ ๋…ผ๋ฌธ์€ ์ž๊ธฐ ๊ฒ€์ฆ์  ๊ฐ•ํ™”ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, 395๋ฒˆ์˜ ์•ˆ์ „์„ฑ ์ง€ํ–ฅ ์žฅ๋ฒฝํ•จ์ˆ˜์™€ ์‹œ๋„ˆ์ง€ ๋˜๋Š” ์ƒํ˜ธ ๋ณด์™„์  ์‘์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์‹œ์‚ฌํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Guided by guardrails ๋…ผ๋ฌธ์€ ์‹ค์„ธ๊ณ„ ๋กœ๋ด‡ ์ œ์–ด ์˜์—ญ์—์„œ CBF ๊ธฐ๋ฐ˜ ๊ฐ•ํ™”ํ•™์Šต์˜ ์•ˆ์ „์„ฑ๊ณผ ์„ฑ๋Šฅ ๋ฌธ์ œ๋ฅผ ๋ฐ€์ ‘ํžˆ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
Reinforcement Learning for Dynamic Microfluidic Control ๋…ผ๋ฌธ์€ ์‹ค์ œ RL ๊ธฐ๋ฐ˜ ์‹คํ—˜ ์ œ์–ด์— ์•ˆ์ „์„ฑ ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๊ตฌํ˜„ํ•˜์—ฌ, CBFs๋ฅผ ํ†ตํ•œ ์•ˆ์ „ ์ œ์–ด์˜ ์‹ค์šฉ์  ์ ์šฉ์‚ฌ๋ก€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •