Large Language Models Orchestrating Structured Reasoning Achieve Kaggle Grandmaster Level

์ €์ž: Antoine Grosnit, Alexandre Maraval, Refinath S N, Zichao Zhao, James Doran | ๋‚ ์งœ: 2024 | DOI: arXiv:2411.03562 📄 PDF


Essence

Figure 2

Figure 2. From Scaffolded Experiential Learning to Autonomous Generalisation. The

๋ณธ ๋…ผ๋ฌธ์€ Kolb์˜ ๊ฒฝํ—˜ ํ•™์Šต ์ด๋ก ๊ณผ Vygotsky์˜ ๊ทผ์ ‘๋ฐœ๋‹ฌ์˜์—ญ(ZPD)์„ ๊ธฐ๋ฐ˜์œผ๋กœ LLM ์—์ด์ „ํŠธ๋ฅผ ์œ„ํ•œ ๊ตฌ์กฐํ™”๋œ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค. Agent K๋ผ๋Š” ์ž์œจ ์‹œ์Šคํ…œ์„ ํ†ตํ•ด Kaggle ๋ฐ์ดํ„ฐ ๊ณผํ•™ ๊ฒฝ์Ÿ์—์„œ ์ธ๊ฐ„ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ(Elo-MMR 1694)์„ ๋‹ฌ์„ฑํ•˜์—ฌ LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์ž…์ฆํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4. Comparison of Agent Kโ€™s Elo-MMR score with that of human participants. The

Agent K์˜ ์ฃผ์š” ์„ฑ๊ณผ: 1. Kaggle 81๊ฐœ ์ž‘์—…์—์„œ ์™„์ „ ์ž์œจ ๋ฐ์ดํ„ฐ ๊ณผํ•™ ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌํ˜„ ๋ฐ Elo-MMR 1694 ๋‹ฌ์„ฑ (Kaggle ์ƒ์œ„ 2% Masters ์ค‘์•™๊ฐ’ ์ดˆ๊ณผ), 2. ์ƒ๊ธˆ ๊ฒฝ์Ÿ์—์„œ 4๊ธˆ 4์€ ๋ฉ”๋‹ฌ ์ˆ˜์ค€ ์„ฑ๋Šฅ, ๋‹ค์–‘ํ•œ ํƒ€์ž… ๊ฒฝ์Ÿ์—์„œ 5๊ธˆ 8์€ 12๋™ ๋ฉ”๋‹ฌ ์ˆ˜์ค€ ์„ฑ๋Šฅ ๋‹ฌ์„ฑ, 3. ํ‘œ ํ˜•์‹, ์ปดํ“จํ„ฐ ๋น„์ „, ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ ๋“ฑ ๋‹ค์–‘ํ•œ ์˜์—ญ์—์„œ ์ผ๊ด€๋œ ์ธ๊ฐ„ ๊ฒฝ์Ÿ ์ˆ˜์ค€ ์„ฑ๋Šฅ ์ฆ๋ช…, 4. ๊ธฐ์กด ์ž๋™ํ™” ๋ฐฉ๋ฒ•(AutoML ๋“ฑ)๊ณผ ๋‹ฌ๋ฆฌ ์ „์ฒด ๋ฐ์ดํ„ฐ ๊ณผํ•™ ํŒŒ์ดํ”„๋ผ์ธ์„ ์ž์œจ์ ์œผ๋กœ ๊ด€๋ฆฌํ•˜๋ฉด์„œ๋„ ๊ณต์‹ ์ตœ์ข… ๋ฆฌ๋”๋ณด๋“œ์—์„œ ์ธ๊ฐ„ ์ฐธ๊ฐ€์ž์™€ ์ง์ ‘ ๋น„๊ต.

How

Figure 7

Figure 7. Two-stage scaffolded learning environment in Agent K. In the Workspace Scaf-

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 5/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ์ธ์ง€ ๊ณผํ•™ ์ด๋ก ์„ ๊ธฐ๋ฐ˜์œผ๋กœ LLM ์—์ด์ „ํŠธ์˜ ๊ตฌ์กฐํ™”๋œ ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜๊ณ , ์‹ค์ œ ๊ฒฝ์Ÿ ํ™˜๊ฒฝ์—์„œ ์ธ๊ฐ„ ์ˆ˜์ค€์˜ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•จ์œผ๋กœ์จ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ์žˆ๋Š” AI ๊ฐœ๋ฐœ์— ์˜๋ฏธ ์žˆ๋Š” ์ง„์ „์„ ๋ณด์˜€๋‹ค. Kolb์˜ ๊ฒฝํ—˜ ํ•™์Šต ์ด๋ก ๊ณผ Vygotsky์˜ ZPD๋ฅผ ๊ณ„์‚ฐ์ ์œผ๋กœ ๊ตฌํ˜„ํ•œ ์‹œ๋„๋Š” ๋…์ฐฝ์ ์ด๋ฉฐ, Kaggle์—์„œ์˜ ๊ด‘๋ฒ”์œ„ํ•œ ์‹ค์ฆ์  ๊ฒ€์ฆ์€ ๋ฐฉ๋ฒ•๋ก ์˜ ์‹ค์šฉ์„ฑ์„ ์ž…์ฆํ•œ๋‹ค. ๋‹ค๋งŒ ํŠน์ • ๋„๋ฉ”์ธ ์ตœ์ ํ™”, ๊ณ„์‚ฐ ๋น„์šฉ ์ƒ์„ธํ™”, ํƒ€ ์˜์—ญ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Kolb ํ•™์Šต ์ด๋ก  ๋“ฑ ๊ฒฝํ—˜์  ํ•™์Šต๊ณผ AI ๊ณผํ•™์ž์˜ ์‚ฌ์ƒ์  ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•˜๋ฉฐ, ํ•™์Šต ๊ณผ์ •์˜ ์ž๊ธฐ ๊ฐ•ํ™” ๋ฃจํ”„ ์ธก๋ฉด์„ ๊ณต์œ ํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
476๋ฒˆ ๋…ผ๋ฌธ์—์„œ ๊ฐ€์„ค๋ฐœ๊ฒฌยท๊ทœ์น™ํ•™์Šต ๋“ฑ LLM Reasoning์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ Peirce/๊ต์œก์‹ฌ๋ฆฌ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ด์„ํ•œ 363๋ฒˆ ๋…ผ๋ฌธ์ด ์ค‘์š”ํ•œ ๋ฐฐ๊ฒฝ์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋‡Œ์˜๊ฐ ๋ชจ๋“ˆ์‹ ๊ธฐ์–ต ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ฐ ์—์ด์ „ํŠธ ์„ค๊ณ„๋ฅผ ์‹ฌ์ธต์ ์œผ๋กœ ๋ฆฌ๋ทฐํ•˜๋ฉฐ, ๊ฒฝํ—˜์  ํ•™์Šต ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์˜ ์ด๋ก ์  ๊ทผ๊ฑฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
763๋ฒˆ ๋…ผ๋ฌธ์€ LLM์„ ํ†ตํ•œ ๊ตฌ์กฐํ™”๋œ ๋…ผ๋ฌธ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ณผํ•™์  ๊ฐ€์„ค ์ƒ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์‹คํ—˜ยทํ‰๊ฐ€ํ•˜์—ฌ, 476๋ฒˆ์˜ Agent K ์‚ฌ๋ก€์™€ ์„ฑ๊ณผ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์Šค์Šค๋กœ ์ƒˆ๋กœ์šด ์—ฐ๊ตฌ ์•„์ด๋””์–ด๋ฅผ ์ƒ์„ฑํ•˜๊ณ  ์‹คํ—˜์„ ์ˆ˜ํ–‰ํ•˜๋Š” ์˜คํ† ๋ฆฌ์„œ์น˜ ์‹œ์Šคํ…œ์„ ๋ฒค์น˜๋งˆํฌ๋กœ ์ œ์•ˆํ•˜์—ฌ, Kolb ํ•™์Šต ์ด๋ก  ๊ตฌํ˜„๊ณผ๋Š” ๋‹ค๋ฅธ ์ž๋™ํ™” ํŒจ๋Ÿฌ๋‹ค์ž„์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ์—์ด์ „ํŠธ์˜ ๋Šฅ๋ ฅ๊ณผ ํ•œ๊ณ„๋ฅผ ๋‹ค๋ฅธ ๊ด€์ ์—์„œ ์ข…ํ•ฉ์ ์œผ๋กœ ๊ฒ€ํ† ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Co-Scientist๋Š” ๋‹ค์ค‘ ์—์ด์ „ํŠธ LLM ๊ธฐ๋ฐ˜ ๊ตฌ์กฐํ™” ๊ณผํ•™์  ์‚ฌ๊ณ ๋กœ Agent K์™€ ์œ ์‚ฌ ๋ชฉํ‘œ๋ฅผ ๊ตฌํ˜„ํ•˜์ง€๋งŒ, ๋ฐฉ๋ฒ•๋ก ์ด ๋‹ค๋ฆ…๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Agent K๋Š” Kolb ๋“ฑ ์ธ๊ฐ„ ํ•™์Šต ์ด๋ก ์„ ๊ธฐ๊ณ„์ ์œผ๋กœ ๊ตฌํ˜„ํ•˜์—ฌ AI Scientist์˜ ์‹ค์ œ ์‹คํ˜„ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Large Language Models Orchestrating Structured Reasoning ๋…ผ๋ฌธ์€ ์‹ค๋ฐ์ดํ„ฐ ๊ณผํ•™ ๋Œ€ํšŒ ๋“ฑ์„ ํ†ตํ•œ LLM ์—์ด์ „ํŠธ์˜ ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๋ฐ ๋ฒค์น˜๋งˆํ‚น์„ ํ™•์žฅํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Reinforcing clinical decision support through multi-agent ์‹œ์Šคํ…œ ์—ญ์‹œ ์‹ค์ œ ์ „๋ฌธ๊ฐ€ ์ˆ˜์ค€์˜ ์˜์‚ฌ๊ฒฐ์ • ์ง€์›์— agentic LLM์„ ์ ์šฉํ•œ ์‚ฌ๋ก€๋กœ, Agent K์™€์˜ ํฌ๋กœ์Šค๋„๋ฉ”์ธ ํ™•์žฅ ๋…ผ์˜์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Agent K ์—ญ์‹œ ์ž๊ธฐ๊ฒ€์ฆ, ์ž๊ธฐ์กฐ์งํ™” ๋ฉ”์ปค๋‹ˆ์ฆ˜์„ ๋„์ž…ํ•œ LLM ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ์‚ฌ์ด์–ธ์Šค ์—์ด์ „ํŠธ๋กœ, GeneAgent ์‹œ์Šคํ…œ์˜ ํ™•์žฅ์  ๋…ผ์˜๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
InfiAgent-DABench๋Š” LLM ์—์ด์ „ํŠธ์˜ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๋Šฅ๋ ฅ์„ ์‹ค์ œ ๊ณผ์—…์œผ๋กœ ํ‰๊ฐ€ํ•˜๋Š” ๋Œ€ํ‘œ์  ์‘์šฉ ๋ฒค์น˜๋งˆํฌ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •