Towards large language models as copilots for theorem proving in lean

์ €์ž: Baptiste Roziรจre, Jonas Gehring, Fabian Gloeckle, Sten Sootla, Itai Gat, Xiaoqing Ellen Tan, Yossi Adi, Jingyu Liu, Sauvestre, Romain, Tal Remez, Jรฉrรฉmy Rapin, Artyom Kozhevnikov, Ivan Evtimov, Joanna Bitton, Manish Bhatt, Cristian Canton Ferrer, Aaron Grattafiori, Wenhan Xiong, Alexandre Dรฉfossez, Jade Copet | ๋‚ ์งœ: 2024 | URL: https://arxiv.org/abs/2404.12534 📄 PDF


Essence

Figure 1

Figure 1: Lean Copilot provides a general framework for running LLM inference in Lean, either

Lean Copilot์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ(LLM)์„ Lean ์ฆ๋ช…๋ณด์กฐ๊ธฐ์— ์ง์ ‘ ํ†ตํ•ฉํ•˜์—ฌ ์ธ๊ฐ„ ์ˆ˜ํ•™์ž๊ฐ€ ์ •๋ฆฌ ์ฆ๋ช…์„ ๋ณด์กฐ๋ฐ›์„ ์ˆ˜ ์žˆ๋Š” neuro-symbolic ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ์ „์ˆ  ์ œ์•ˆ, ์ฆ๋ช… ํƒ์ƒ‰, ์ „์ œ ์„ ํƒ ๋“ฑ์˜ ์ฆ๋ช… ์ž๋™ํ™” ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: The frontend of Lean Copilotโ€™s SUGGEST_TACTICS. The user imports Lean Copilot just

How

Figure 1

Figure 1: Lean Copilot provides a general framework for running LLM inference in Lean, either

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ LLM์„ ์ฆ๋ช…๋ณด์กฐ๊ธฐ์— ์‹ค์ œ๋กœ ํ†ตํ•ฉํ•˜๋Š” ์‹ค์šฉ์ ์ด๋ฉด์„œ๋„ ํ˜์‹ ์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•˜๋ฉฐ, ์™„์ „ ์ž๋™ํ™” ๋Œ€์‹  ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€๋ฅผ ๋ณด์กฐํ•˜๋Š” copilot ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋„์ž…ํ•จ์œผ๋กœ์จ ํ˜„์‹ค ์ˆ˜ํ•™์ž์˜ ์ƒ์‚ฐ์„ฑ ํ–ฅ์ƒ์— ์ง์ ‘ ๊ธฐ์—ฌํ•œ๋‹ค. ์˜คํ”ˆ์†Œ์Šค ๊ณต๊ฐœ์™€ ๋†’์€ ์ž๋™ํ™” ์„ฑ๋Šฅ(AESOP ๋Œ€๋น„ 85% ํ–ฅ์ƒ)์œผ๋กœ ํ•™๊ณ„์™€ ์‹ค๋ฌด์— ์ฆ‰์‹œ ์ž„ํŒฉํŠธ๋ฅผ ๋ฏธ์น  ์ˆ˜ ์žˆ๋Š” ์šฐ์ˆ˜ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ฆ๋ช… ์ƒ์„ฑ ๋ฐ ์ž๋™ํ™” ๊ด€๋ จ LLM์˜ ์‹ ๊ฒฝ์‹ฌ๋ณผ๋ฆญ ์ ‘๊ทผ์— ์ดˆ์ ์„ ๋งž์ถฐ Lean Copilot์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜๊ณผ ์ฐจ๋ณ„์ ์„ ์ดํ•ดํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋น„ํ˜•์‹์  ์ˆ˜ํ•™์—์„œ LLM ๊ธฐ๋ฐ˜ ์ •๋ฆฌ ์ฆ๋ช… ์ž๋™ํ™”์— ๊ด€๋ จํ•œ ์„ ํ–‰ ์‹œ๋„๋ฅผ ๋‹ด์•„ ์ด๋ก ์  ๋ฐฐ๊ฒฝ์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
482๋ฒˆ ๋…ผ๋ฌธ์€ ์‚ฌ๊ณ ์™€ ์ฆ๋ช…์„ ๋ฒˆ๊ฐˆ์•„ ์ˆ˜ํ–‰ํ•˜๋Š” ์‹ ๊ฒฝ-์‹ฌ๋ณผ๋ฆญ ์ฆ๋ช… ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, 1095์˜ Lean Copilot์— ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
LLM์„ ์ˆ˜ํ•™ ์ •๋ฆฌ ์ฆ๋ช… ๋ถ„์•ผ์— ๋„์ž…ํ•œ ์ดˆ๊ธฐ ์—ฐ๊ตฌ๋กœ, 568์˜ ์ฆ๋ช… ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐฉ์‹์— ์ด๋ก ์  ์˜๊ฐ์„ ์ค๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
neuro-symbolic ์ฆ๋ช… ํƒ์ƒ‰์˜ ๋Œ€์•ˆ์  ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ˜•์‹ ์ˆ˜ํ•™ ์ฆ๋ช…์—์„œ LLM ํ™œ์šฉ์˜ ์œ ์‚ฌํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ทจํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM์„ ํ™œ์šฉํ•œ ์ •๋ฆฌ ์ฆ๋ช… ์ž๋™ํ™”์˜ ๋Œ€์•ˆ์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
535๋Š” ์ˆ˜์น˜ ํ•ด์„ ๋ถ„์•ผ์—์„œ LLM์˜ ์ฆ๋ช…์ž๋™ํ™”, mult-agent ํ˜‘์—…์ฒด๊ณ„์˜ ํ†ตํ•ฉ ์ ์šฉ ์‚ฌ๋ก€๋กœ, 1095์˜ ์ด๋ฏธ์ง€์™€ ์ง€์‹๊ธฐ๋ฐ˜ ํ†ตํ•ฉ ์ฆ๋ช… ์ง€์›๊ณผ ์ ‘๊ทผ๋ฒ•์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Towards large language models as copilots for theorem proving ๋…ผ๋ฌธ์€ LLM์ด ์ฆ๋ช… ๋ณด์กฐ ๋„๊ตฌ๋กœ ์ง„ํ™”๋œ ํ™•์žฅ ์‚ฌ๋ก€๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •