Stability of Control Lyapunov Function Guided Reinforcement Learning

์ €์ž: Zachary Olkin, William D. Compton, Aaron D. Ames | ๋‚ ์งœ: 2026-05-03 | URL: https://arxiv.org/abs/2605.01978 📄 PDF


Essence

Figure 1

Fig. 1.

๋ณธ ๋…ผ๋ฌธ์€ Control Lyapunov Function (CLF)์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ ๊ฐ•ํ™”ํ•™์Šต(CLF-RL)์œผ๋กœ ํ•™์Šต๋œ ์ œ์–ด ์ •์ฑ…์˜ ์ด๋ก ์  ์•ˆ์ •์„ฑ์„ ๋ถ„์„ํ•œ๋‹ค. ์—ฐ์†ยท์ด์‚ฐ ์‹œ๊ฐ„ ๋ชจ๋‘์—์„œ ์ตœ์  ์ œ์–ด ๋ฌธ์ œ๋กœ ์žฌ์ •์˜ํ•˜์—ฌ ์ง€์ˆ˜ ์•ˆ์ •์„ฑ์„ ์ฆ๋ช…ํ•˜๊ณ , ์ด๋ฅผ ์ˆ˜์น˜ ๊ฒ€์ฆ ๋ฐ ํœด๋จธ๋…ธ์ด๋“œ ๋กœ๋ด‡์˜ ์ฃผ๊ธฐ ๋ณดํ–‰ ์‹คํ—˜์œผ๋กœ ๊ฒ€์ฆํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Fig. 2.

Theorem 1 (์—ฐ์†์‹œ๊ฐ„ ์ง€์ˆ˜์•ˆ์ •์„ฑ): ์ตœ์  ์ •์ฑ… ฯ€โˆ—๋Š” ์›์  ๊ทผ์ฒ˜์—์„œ ์ง€์—ญ ์ง€์ˆ˜ ์•ˆ์ •์„ ๋‹ฌ์„ฑํ•˜๋ฉฐ, โˆฅx(t)โˆฅ โ‰ค โˆš[cโ‚‚(ฮณ + 2L)/(cโ‚(ฮณ + ฮป))] e^(-ฮปt/2) โˆฅxโ‚€โˆฅ์˜ ๋ช…์‹œ์  ์ˆ˜๋ ด ๊ฒฝ๊ณ„๋ฅผ ๋งŒ์กฑํ•œ๋‹ค. Theorem 2 (์ด์‚ฐ์‹œ๊ฐ„ ํ™•์žฅ): ์ด์‚ฐ์‹œ๊ฐ„ ๋™์—ญํ•™์—์„œ๋„ ๋™์ผํ•œ ์ง€์ˆ˜ ์•ˆ์ •์„ฑ ์„ฑ์งˆ์„ ์ฆ๋ช…ํ•˜๊ณ  ๋กœ๋ฒ„์ŠคํŠธ์„ฑ์„ ๋ณด์žฅํ•œ๋‹ค. ์ˆ˜์น˜ ๊ฒ€์ฆ: Double integrator ๋ฐ cart-pole ์‹œ์Šคํ…œ์—์„œ ์ด๋ก ์  ๊ฒฝ๊ณ„๊ฐ€ ์ˆ˜์น˜ ํ•ด์™€ ์ผ์น˜ํ•จ์„ ํ™•์ธํ•œ๋‹ค. ์‹คํ—˜ ๊ฒ€์ฆ: Unitree G1 ํœด๋จธ๋…ธ์ด๋“œ๋กœ๋ด‡์˜ ์ฃผ๊ธฐ ๋ณดํ–‰ ์ถ”์ข… ํ•™์Šต์— CLF-RL์„ ์ ์šฉํ•˜์—ฌ ์•ˆ์ •์ ์ธ ๋ณดํ–‰ ๊ถค์  ์ถ”์ข… ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

How

Figure 2

Fig. 2.

โ€ข Assumption 1-2์—์„œ CLF์˜ ์กด์žฌ ๋ฐ ์ตœ์  ์ •์ฑ… ์กด์žฌ ์กฐ๊ฑด์„ ์„ ์–ธํ•˜๊ณ , positive definiteness์™€ forward invariance๋ฅผ ๋ณด์ด๊ธฐ ์œ„ํ•ด 4๊ฐœ์˜ ๋ณด์กฐ์ •๋ฆฌ(Lemma 1-4)๋ฅผ ์ฆ๋ช…. โ€ข Lemma 1: J(0)=0์ด๊ณ  J(x)>0 (xโ‰ 0)์˜ ์–‘์ •๋ถ€ํ˜ธ์„ฑ ์ฆ๋ช…. โ€ข Lemma 2: ์ƒํ•œ ๊ฒฝ๊ณ„ J(x) โ‰ค ฮฒ/(ฮณ+ฮป) V(x) ๋„์ถœ. โ€ข Lemma 3: HJB ์‹๊ณผ ์ตœ์ ์„ฑ ์กฐ๊ฑด์œผ๋กœ๋ถ€ํ„ฐ Jฬ‡(x) โ‰ค -ฮปJ*(x) ๋„์ถœ. โ€ข Lemma 4: ์••์ถ•์„ฑ(compactness)๊ณผ ๋ถˆ๋ณ€์„ฑ(invariance) ์ฆ๋ช…. โ€ข ์ด์‚ฐ์‹œ๊ฐ„์˜ ๊ฒฝ์šฐ ๋น„์Šทํ•œ ๋…ผ๋ฆฌ๋ฅผ difference equation์— ์ ์šฉํ•˜๊ณ , ์ถ”๊ฐ€ ์‹ค์šฉ์  ๋ณด์ƒํ•ญ(additional practical reward terms)์„ ํฌํ•จํ•˜๋Š” ํ™•์žฅ์„ ์ œ์‹œ.

Originality

โ€ข CLF-RL์ด ์‹ค์ œ๋กœ ์ง€์ˆ˜ ์•ˆ์ •์„ฑ์„ ๋ณด์ฆํ•œ๋‹ค๋Š” ์ตœ์ดˆ์˜ ์ˆ˜ํ•™์  ์ฆ๋ช… ์ œ์‹œ. โ€ข Optimal control ํ”„๋ ˆ์ž„์›Œํฌ์™€ Lyapunov ์•ˆ์ •์„ฑ ์ด๋ก ์˜ ํ†ตํ•ฉ์  ํ™œ์šฉ์œผ๋กœ RL์˜ ์ด๋ก ์  ๊ธฐ์ดˆ๋ฅผ ๊ฐ•ํ™”. โ€ข ์—ฐ์†์‹œ๊ฐ„๊ณผ ์ด์‚ฐ์‹œ๊ฐ„ ๋ชจ๋‘์—์„œ ์ฆ๋ช…ํ•˜๊ณ  ์‹ค์šฉ์  ์ถ”๊ฐ€ ๋ณด์ƒํ•ญ๊นŒ์ง€ ํฌํ•จํ•˜๋Š” ํฌ๊ด„์  ๋ถ„์„.

Limitation & Further Study

โ€ข Assumption 1-2์˜ ์กด์žฌ ์กฐ๊ฑด์ด ์„ ์–ธ์ ์ด๋ฉฐ, ๋ชจ๋“  ์‹ค์ œ ์‹œ์Šคํ…œ์—์„œ CLF์˜ ๊ตฌ์„ฑ ๋ฐฉ๋ฒ•์ด ์ž๋ช…ํ•˜์ง€ ์•Š๋‹ค. โ€ข ์ง€์—ญ ์•ˆ์ •์„ฑ๋งŒ ์ฆ๋ช…๋˜๋ฉฐ, ์ˆ˜๋ ด ์˜์—ญ(region of attraction)์˜ ํฌ๊ธฐ์— ๋Œ€ํ•œ ๋ช…์‹œ์  ์ •๋Ÿ‰ํ™”๊ฐ€ ๋ถ€์กฑ. โ€ข ํœด๋จธ๋…ธ์ด๋“œ ์‹คํ—˜์€ ์ฃผ๊ธฐ ๊ถค์  ์ถ”์ข…์œผ๋กœ ์ œํ•œ๋˜์–ด ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ(disturbance ๋Œ€์‘, ๋น„์ •์ƒ ์ƒํ™ฉ)์ด ๊ฒ€์ฆ๋˜์ง€ ์•Š์Œ. ํ›„์† ์—ฐ๊ตฌ: ์ „์—ญ ์•ˆ์ •์„ฑ ์กฐ๊ฑด ๋„์ถœ, CLF ์ž๋™ ์ƒ์„ฑ ๋ฐฉ๋ฒ• ๊ฐœ๋ฐœ, ๋ชจ๋ธ ๋ถˆํ™•์‹ค์„ฑ ํ•˜์—์„œ์˜ ๋กœ๋ฒ„์ŠคํŠธ์„ฑ ๋ถ„์„ ํ•„์š”.

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ CLF-RL์˜ ์‹ค์ œ ์„ฑ๊ณต์„ ์ด๋ก ์œผ๋กœ ๋’ท๋ฐ›์นจํ•˜๋Š” ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋กœ, ์ง€์ˆ˜ ์•ˆ์ •์„ฑ ์ฆ๋ช…์ด ๋ช…ํ™•ํ•˜๊ณ  ์—ฐ์†ยท์ด์‚ฐ ์‹œ๊ฐ„ ๋ชจ๋‘์—์„œ ํฌ๊ด„์ ์œผ๋กœ ๋‹ค๋ฃจ์–ด์กŒ๋‹ค. ๋‹ค๋งŒ ์ง€์—ญ ์•ˆ์ •์„ฑ ํ•œ์ •, CLF ๊ตฌ์„ฑ ๋ฐฉ๋ฒ•์˜ ์‹ค์šฉ์„ฑ ๋ถ€์žฌ, ์ œํ•œ๋œ ์‹คํ—˜ ๊ฒ€์ฆ์ด ํ•œ๊ณ„์ด๋‚˜, ์ œ์–ด ์ด๋ก ๊ณผ RL์˜ ๊ฒฉ์ฐจ๋ฅผ ์ค„์ด๋Š” ๊ฐ€์น˜ ์žˆ๋Š” ์ฒซ ๊ฑธ์Œ์ด๋‹ค.

← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •