Small Language Models are the Future of Agentic AI

์ €์ž: Peter Belcak, Greg Heinrich, Shizhe Diao, Yonggan Fu, Xin Dong, Saurav Muralidharan, Yingyan Celine Lin, Pavlo Molchanov | ๋‚ ์งœ: 2025-06-02 | DOI: 10.48550/arXiv.2506.02153 📄 PDF


Essence

ํ˜„์žฌ ์—์ด์ „ํŠธ AI ์‹œ์Šคํ…œ์€ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ(LLM)์— ์˜์กดํ•˜๊ณ  ์žˆ์œผ๋‚˜, ๋ณธ ๋…ผ๋ฌธ์€ ์†Œ๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ(SLM)์ด ์—์ด์ „ํŠธ์˜ ๋ฐ˜๋ณต์ ์ด๊ณ  ์ „๋ฌธํ™”๋œ ์ž‘์—…์— ๋” ์ ํ•ฉํ•˜๋ฉฐ ๊ฒฝ์ œ์ ์ด๋ฏ€๋กœ ์—์ด์ „ํŠธ AI์˜ ๋ฏธ๋ž˜๋ฅผ ์ฃผ๋„ํ•  ๊ฒƒ์ด๋ผ๋Š” ์ž…์žฅ์„ ์ œ์‹œํ•œ๋‹ค.

Motivation

Achievement

Figure 1: An illustration of agentic systems with different modes of agency. Left: Language model agency. The language model acts both as the HCI and the orchestrator of tool calls to carry out a task. Right: Code agency. The language model fills the role of the HCI (optionally) while a dedicated controller code orchestrates all interactions.

์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ๋‘ ๊ฐ€์ง€ ์šด์˜ ๋ฐฉ์‹: ์ขŒ์ธก์€ ์–ธ์–ด๋ชจ๋ธ์ด ์ธํ„ฐํŽ˜์ด์Šค์™€ ๋„๊ตฌ ํ˜ธ์ถœ์„ ๋ชจ๋‘ ์กฐ์œจํ•˜๋Š” ๋ฐฉ์‹, ์šฐ์ธก์€ ์ฝ”๋“œ ๊ธฐ๋ฐ˜ ์ปจํŠธ๋กค๋Ÿฌ๊ฐ€ ์ƒํ˜ธ์ž‘์šฉ์„ ์กฐ์œจํ•˜๋Š” ๋ฐฉ์‹

  1. SLM์˜ ์ถฉ๋ถ„ํ•œ ์„ฑ๋Šฅ ์ž…์ฆ:
    • Phi-2 (2.7B): 30B ๋ชจ๋ธ ์ˆ˜์ค€์˜ ์ƒ์‹์ถ”๋ก , ~15๋ฐฐ ๋น ๋ฅธ ์†๋„
    • Phi-3 Small (7B): 70B ๋ชจ๋ธ ์ˆ˜์ค€์˜ ์ฝ”๋“œ์ƒ์„ฑ ์„ฑ๋Šฅ
    • Nemotron-H (2-9B): 30B LLM ์ˆ˜์ค€์˜ ๋ช…๋ น์–ด ๋”ฐ๋ฅด๊ธฐ, ํ•œ ์ž๋ฆฌ ์ˆ˜ ์ถ”๋ก  ๋น„์šฉ
    • xLAM-2-8B: GPT-4o์™€ Claude 3.5๋ฅผ ๋Šฅ๊ฐ€ํ•˜๋Š” ๋„๊ตฌ ํ˜ธ์ถœ ์„ฑ๋Šฅ
    • DeepSeek-R1-Distill-7B: Claude-3.5-Sonnet ๋ฐ GPT-4o ์ดˆ๊ณผ ์„ฑ๋Šฅ
  2. ๊ฒฝ์ œ์„ฑ ์šฐ์›”์„ฑ:
    • ์ถ”๋ก  ํšจ์œจ์„ฑ: 7B SLM์€ 70-175B LLM ๋Œ€๋น„ 10-30๋ฐฐ ์ €๋ ด
    • ๋ฏธ์„ธ์กฐ์ •(Fine-tuning) ๋ฏผ์ฒฉ์„ฑ: GPU ๋ช‡ ์‹œ๊ฐ„์œผ๋กœ ์™„๋ฃŒ (LLM์€ ์ฃผ ๋‹จ์œ„)
    • ์—ฃ์ง€ ๋ฐฐํฌ: ์†Œ๋น„์ž GPU์—์„œ ๋กœ์ปฌ ์‹คํ–‰ ๊ฐ€๋Šฅ
    • ์ธํ”„๋ผ ์šด์˜: GPU/๋…ธ๋“œ ๊ฐ„ ๋ณ‘๋ ฌํ™” ํ•„์š”์„ฑ ๊ฐ์†Œ๋กœ ์œ ์ง€๋ณด์ˆ˜ ๋น„์šฉ ์ ˆ๊ฐ
  3. ์ด์งˆํ˜• ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ์ œ์•ˆ:
    • SLM์„ ๊ธฐ๋ณธ ๋ชจ๋ธ๋กœ ์‚ฌ์šฉํ•˜๊ณ  ํ•„์š”์‹œ์—๋งŒ ์„ ํƒ์ ์œผ๋กœ LLM ํ˜ธ์ถœ
    • ํŠนํ™”๋œ SLM ์กฐํ•ฉ์œผ๋กœ ๋ชจ๋“ˆ์‹ ์•„ํ‚คํ…์ฒ˜ ๊ตฌ์„ฑ

How

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 5/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ํ˜„์žฌ LLM ์ค‘์‹ฌ์˜ ์—์ด์ „ํŠธ AI ์‚ฐ์—…์— ๋Œ€ํ•œ ๊ฒฝ์ œ์ ยท๊ธฐ์ˆ ์ ยทํ™˜๊ฒฝ์  ๋น„ํŒ์„ ์ œ๊ธฐํ•˜๊ณ  SLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์œผ๋กœ์˜ ์ „ํ™˜์„ ์ฃผ์žฅํ•˜๋Š” ์ค‘์š”ํ•œ ์ž…์žฅ ๋…ผ๋ฌธ์ด๋‹ค. NVIDIA ์—ฐ๊ตฌ์ง„์˜ ์ฒด๊ณ„์ ์ธ ์ฃผ์žฅ๊ณผ ๋‹ค์–‘ํ•œ ์ตœ์‹  SLM ๋ชจ๋ธ๋“ค์˜ ์„ฑ๋Šฅ ์‚ฌ๋ก€๋ฅผ ํ†ตํ•ด ๊ธฐ์ˆ ์  ํƒ€๋‹น์„ฑ์„ ์ž…์ฆํ•˜๋ฉฐ, ์ˆ˜๋ฐฑ์–ต ๋‹ฌ๋Ÿฌ ๊ทœ๋ชจ์˜ ์ธํ”„๋ผ ํˆฌ์ž ๋ถˆ์ผ์น˜ ๋ฌธ์ œ๋ฅผ ๋‚ ์นด๋กญ๊ฒŒ ์ง€์ ํ•œ๋‹ค. ๋‹ค๋งŒ ๋Œ€๊ทœ๋ชจ ์‹ค์ฆ ๋ฐ์ดํ„ฐ์™€ ํ”„๋กœ๋•์…˜ ํ™˜๊ฒฝ์˜ ๊ฒ€์ฆ, ๊ทธ๋ฆฌ๊ณ  ์ฒด๊ณ„์ ์ธ ๋„๋ฉ”์ธ๋ณ„ ๊ฒฝ๊ณ„ ์กฐ๊ฑด ๋ถ„์„์ด ์ถ”๊ฐ€๋˜๋ฉด ๋”์šฑ ๊ฐ•๋ ฅํ•œ ์ฃผ์žฅ์ด ๋  ์ˆ˜ ์žˆ๋‹ค. ์—์ด์ „ํŠธ AI์˜ ๋น ๋ฅธ ์„ฑ์žฅ๊ณผ AI ๋น„์šฉ ํšจ์œจ์„ฑ์— ๋Œ€ํ•œ ์—…๊ณ„ ๊ด€์‹ฌ์„ ๊ณ ๋ คํ•  ๋•Œ, ์ปค๋ฎค๋‹ˆํ‹ฐ ๋…ผ์˜๋ฅผ ์ด‰๋ฐœํ•  ๋งŒํ•œ ๊ฐ€์น˜ ์žˆ๋Š” ๊ธฐ์—ฌ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
370์€ ์ค‘์†Œํ˜• LLM(Gemma 2)์˜ ์‹ค์šฉ์„ฑ๊ณผ ์˜คํ”ˆ ๋ชจ๋ธ์˜ ๊ฐ€๋Šฅ์„ฑ์„ ๋…ผ์˜ํ•˜์—ฌ 760์˜ SLM ์ฃผ์žฅ์— ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
464 ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์ง„ํ™” ๊ฒฝ๋กœ์™€ SLMยทLLM ๋Œ€๋ฆฝ ๊ตฌ์กฐ์— ๋Œ€ํ•œ ์ด๋ก ์  ํ”„๋ ˆ์ž„์„ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Llama 3 ๋“ฑ ๊ฒฝ์ œ์  ์†Œํ˜• ์–ธ์–ด๋ชจ๋ธ์˜ ์—์ด์ „ํŠธ ์ ํ•ฉ์„ฑ์„ ์‹ค์ฆ์ ์œผ๋กœ ๊ฒ€ํ† ํ•˜๋Š” ๊ธฐ๋ณธ ๋…ผ๋ฌธ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
120 ๋…ผ๋ฌธ์€ ๋Œ€ํ˜• LLM์„ ์ค‘์‹ฌ์œผ๋กœ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ™œ์šฉ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ์ •๋ฆฝํ•˜๋ฏ€๋กœ, 760์˜ ์†Œํ˜• ๋ชจ๋ธ ์ค‘์‹ฌ ๋ฏธ๋ž˜ ์ „๋ง๊ณผ ๋Œ€์กฐ์ ์ž…๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Evolutionary optimizer๋กœ์„œ์˜ ๋Œ€ํ˜•์–ธ์–ด๋ชจ๋ธ(469)๊ณผ๋Š” ๋ฐ˜๋Œ€๋กœ, 760์€ ์†Œํ˜• ๋ชจ๋ธ์˜ ์šฐ์ˆ˜์„ฑ๊ณผ ํšจ์œจ์„ ์ฃผ์žฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
101์€ ๋Œ€ํ˜• LLM์—์„œ ๊ณ„์ธต์  self-reflective agent๋ฅผ ๋„์ž…ํ•œ ๋ฐฉ๋ฒ•์œผ๋กœ, SLM ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•๊ณผ ๋น„๊ตํ•ด๋ณผ ๋งŒํ•ฉ๋‹ˆ๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
AI ์—์ด์ „ํŠธ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ๋Œ€๊ทœ๋ชจ ๋ชจ๋ธ ํŽธํ–ฅ ๋ฌธ์ œ ๋“ฑ SLM ๋Œ€๋น„ LLM์˜ ์šฐ์›”์„ฑ ๋ฐ ํ•œ๊ณ„์— ๋Œ€ํ•œ ๋…ผ์˜๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
825 ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ LLM ๊ธฐ๋ฐ˜ ์ฝ”-์‚ฌ์ด์–ธํ‹ฐ์ŠคํŠธ ๊ฐœ๋…์„ ๊ฐ•์กฐํ•˜์—ฌ SLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ ๋ฏธ๋ž˜ ์ฃผ์žฅ๊ณผ ์ƒ๋ฐ˜๋˜๋Š” ๊ด€์ ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •