Towards a Science of Scaling Agent Systems

์ €์ž: Yubin Kim, Ken Gu, Chanwoo Park, Chunjong Park, Samuel Schmidgall, A. Ali Heydari, Yao Yan, Zhihan Zhang, Yuchen Zhuang, Mark Malhotra, Paul Pu Liang, Hae Won Park, Yuzhe Yang, Xuhai Xu, Yilun Du, Shwetak Patel, Tim Althoff, Daniel McDuff, Xin Liu | ๋‚ ์งœ: 2025-12-17 | DOI: 10.48550/arXiv.2512.08296 📄 PDF


Essence

Figure 1

Figure 1: ๋ชจ๋ธ ์ง€๋Šฅ(Intelligence Index)๊ณผ ์—์ด์ „ํŠธ ๊ตฌ์กฐ์— ๋”ฐ๋ฅธ ์„ฑ๋Šฅ ๋ณ€ํ™”. ์„ธ ๊ฐ€์ง€ LLM ๊ณ„์—ด(OpenAI, Google, Anthropic)์—์„œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ(MAS) ๋ณ€ํ˜•์ด ๋‹จ์ผ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ(SAS) ๋Œ€๋น„ ์ƒ์ดํ•œ ํ™•์žฅ ํŠน์„ฑ์„ ๋ณด์ž„.

๋ณธ ๋…ผ๋ฌธ์€ ์–ธ์–ด ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ๊ฒฐ์ •ํ•˜๋Š” ์ •๋Ÿ‰์  ํ™•์žฅ ์›์น™(scaling laws)์„ ์ตœ์ดˆ๋กœ ์ฒด๊ณ„์ ์œผ๋กœ ๋„์ถœํ•œ ์—ฐ๊ตฌ์ด๋‹ค. ๋„๊ตฌ ํ™œ์šฉ๋„, ๋ชจ๋ธ ๋Šฅ๋ ฅ, ์ž‘์—… ํŠน์„ฑ ๊ฐ„์˜ ์ƒํ˜ธ์ž‘์šฉ์„ ๋ถ„์„ํ•˜์—ฌ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ(MAS)์ด ์–ธ์ œ ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ณ  ์–ธ์ œ ์ €ํ•˜์‹œํ‚ค๋Š”์ง€ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ์˜ˆ์ธก ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: ๋‹ค์–‘ํ•œ ์ž‘์—… ๋„๋ฉ”์ธ์—์„œ ๋‹จ์ผ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ๊ณผ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ ๋น„๊ต. ์›น ๋„ค๋น„๊ฒŒ์ด์…˜๊ณผ ๊ธˆ์œต ์ถ”๋ก ์—์„œ ์ƒ์ดํ•œ ์•„ํ‚คํ…์ฒ˜ ํšจ๊ณผ๊ฐ€ ๊ด€์ฐฐ๋จ.

  1. ์„ธ ๊ฐ€์ง€ ์ง€๋ฐฐ์  ํ™•์žฅ ํŒจํ„ด ๋ฐœ๊ฒฌ
    • ๋„๊ตฌ-์ขŒํ‘œํ™” ํŠธ๋ ˆ์ด๋“œ์˜คํ”„ (ฮฒ=-0.267, p<0.001): ๋„๊ตฌ๊ฐ€ ๋งŽ์€ ์ž‘์—…(์˜ˆ: 16๊ฐœ ๋„๊ตฌ ์†Œํ”„ํŠธ์›จ์–ด ์—”์ง€๋‹ˆ์–ด๋ง)์—์„œ MAS๋Š” ์—์ด์ „ํŠธ๋‹น ํ† ํฐ ์˜ˆ์‚ฐ ๊ฐ์†Œ๋กœ ์ธํ•ด ๋ณต์žกํ•œ ๋„๊ตฌ ์กฐ์œจ์ด ์–ด๋ ค์›Œ์ง.
    • ๋Šฅ๋ ฅ ํฌํ™”(capability ceiling) (ฮฒ=-0.404, p<0.001): ๋‹จ์ผ ์—์ด์ „ํŠธ ๊ธฐ์ค€ ์„ฑ๋Šฅ์ด ~45% ์ดˆ๊ณผ ์‹œ ์ถ”๊ฐ€ ์—์ด์ „ํŠธ๋Š” ์ขŒํ‘œํ™” ๋น„์šฉ์ด ์ฆ๋ถ„ ๊ฐœ์„ ๋ณด๋‹ค ์ปค์ ธ ์„ฑ๋Šฅ ์ €ํ•˜.
    • ํ† ํด๋กœ์ง€ ์˜์กด์  ์˜ค๋ฅ˜ ์ฆํญ: ๋…๋ฆฝ์  ์—์ด์ „ํŠธ๋Š” 17.2๋ฐฐ ์˜ค๋ฅ˜ ์ฆํญ, ์ค‘์•™ํ™”๋œ ์ขŒํ‘œํ™”๋Š” 4.4๋ฐฐ๋กœ ์–ต์ œ.
  2. ์ž‘์—… ์กฐ๊ฑด๋ถ€ ์•„ํ‚คํ…์ฒ˜ ์„ฑ๋Šฅ
    • ์ค‘์•™ํ™” ์ขŒํ‘œํ™”(Centralized): ๋ณ‘๋ ฌํ™” ๊ฐ€๋Šฅํ•œ ๊ธˆ์œต ์ถ”๋ก ์—์„œ +80.8% ์„ฑ๋Šฅ ํ–ฅ์ƒ
    • ๋ถ„์‚ฐ ์ขŒํ‘œํ™”(Decentralized): ๋™์  ์›น ๋„ค๋น„๊ฒŒ์ด์…˜์—์„œ +9.2% ๊ฐœ์„  (+0.2% vs SAS)
    • ๋ชจ๋“  MAS ๋ณ€ํ˜•: ์ˆœ์ฐจ ์ถ”๋ก  ์ž‘์—…์—์„œ -39% ~ -70% ์„ฑ๋Šฅ ์ €ํ•˜
  3. ์˜ˆ์ธก ํ”„๋ ˆ์ž„์›Œํฌ ์ˆ˜๋ฆฝ
    • ๊ต์ฐจ ๊ฒ€์ฆ Rยฒ=0.524: ๋ฐ์ดํ„ฐ์…‹ ํŠนํ™” ํŒŒ๋ผ๋ฏธํ„ฐ ์—†์ด ๋ณด์œ ํ•œ(held-out) ์ž‘์—… ๋„๋ฉ”์ธ์— ๋Œ€ํ•œ ์„ฑ๋Šฅ ์˜ˆ์ธก ๊ฐ€๋Šฅ
    • ์ตœ์  ์•„ํ‚คํ…์ฒ˜ ์˜ˆ์ธก ์ •ํ™•๋„ 87%
    • GPT-5.2 ๋น„ํ‘œ๋ณธ ๊ฒ€์ฆ(out-of-sample validation): MAE=0.071, 5๊ฐ€์ง€ ํ™•์žฅ ์›์น™ ์ค‘ 4๊ฐ€์ง€๊ฐ€ ๋ฏธ๊ณต๊ฐœ ์ตœ์‹  ๋ชจ๋ธ๋กœ ์ผ๋ฐ˜ํ™”๋จ ํ™•์ธ

How

Figure 3

Figure 3: ๋ชจ๋ธ ๊ณ„์—ด๊ณผ ์•„ํ‚คํ…์ฒ˜ ๊ฐ„์˜ ๋น„์šฉ-์„ฑ๋Šฅ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„. ์ขŒํ‘œํ™” ์˜ค๋ฒ„ํ—ค๋“œ์˜ ์ƒ๋Œ€์  ์ค‘์š”์„ฑ์ด ๋ชจ๋ธ ๋Šฅ๋ ฅ๊ณผ ์ž‘์—… ์œ ํ˜•์— ๋”ฐ๋ผ ๋ณ€ํ•จ.

Originality

Limitation & Further Study

Evaluation

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ํ™•์žฅ ์›์น™์„ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ์ฒซ ๋Œ€๊ทœ๋ชจ ์ œ์–ด ์‹คํ—˜์œผ๋กœ์„œ, "๋‹ค์ค‘ ์—์ด์ „ํŠธ = ํ•ญ์ƒ ์ด๋“"์ด๋ผ๋Š” ํ†ต์„ค์„ ์ •๊ตํ•˜๊ฒŒ ๋ฐ˜๋ฐ•ํ•˜๊ณ  ์ž‘์—…-์•„ํ‚คํ…์ฒ˜ ์ •๋ ฌ์ด ์„ฑ๊ณต์˜ ํ•ต์‹ฌ์ž„์„ ์ฆ๋ช…ํ–ˆ๋‹ค. ํŠนํžˆ ๋„๊ตฌ-์ขŒํ‘œํ™” ํŠธ๋ ˆ์ด๋“œ์˜คํ”„, ๋Šฅ๋ ฅ ํฌํ™”, ํ† 

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
AutoGen์€ ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ LLM ์• ํ”Œ๋ฆฌ์ผ€์ด์…˜ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ํ™•์žฅ ์›์น™ ์—ฐ๊ตฌ๊ฐ€ ๋ถ„์„ํ•˜๋Š” ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์‹ค์ œ ๊ตฌํ˜„ ๊ธฐ๋ฐ˜์ด๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๊ฐ€์„ค ๋ฐœ๊ฒฌ๊ณผ ๊ทœ์น™ ํ•™์Šต ์„œ๋ฒ ์ด๋Š” ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์ด ์ง€์‹์„ ๋ฐœ๊ฒฌํ•˜๋Š” ์ด๋ก ์  ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•˜์—ฌ ์—์ด์ „ํŠธ ํ™•์žฅ ์›์น™ ์—ฐ๊ตฌ์˜ ๊ธฐ๋ฐ˜์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ์„ ์ขŒ์šฐํ•˜๋Š” ์š”์†Œ(๋„๊ตฌ ํ™œ์šฉ, ์Šค์ผ€์ผ๋ง, ํ˜‘์—…)์— ๋Œ€ํ•œ ์ข…ํ•ฉ/์‹ฌ์ธต ๋ฆฌ๋ทฐ ๋…ผ๋ฌธ์œผ๋กœ, scaling law ์—ฐ๊ตฌ์˜ ํ•ต์‹ฌ ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ง€๋Šฅํ˜• ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ํ™•์žฅ ๋ฒ•์น™๊ณผ ์„ค๊ณ„ ์›๋ฆฌ๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ๋ฆฌ๋ทฐ๋กœ, ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ™•์žฅ๋ฒ•์น™์„ ์ •๋Ÿ‰ํ™”ํ•œ ๋…ผ๋ฌธ์˜ ๋ฐฐ๊ฒฝ์ง€์‹์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ํ™•์žฅ ์›์น™์„ ์ •๋Ÿ‰ํ™”ํ•œ ์—ฐ๊ตฌ๋กœ, ๋‹ค์ค‘ ์ŠคํŽ™ํŠธ๋Ÿผ ๋ฐ์ดํ„ฐ๋ฅผ ์ฒ˜๋ฆฌํ•˜๋Š” Earth-Agent์˜ ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ ์„ค๊ณ„ ์ตœ์ ํ™”์— ๊ธฐ์ดˆ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
295 ๋…ผ๋ฌธ์€ ๋ฉ€ํ‹ฐ์†Œ์Šค ํ™˜๊ฒฝ์—์„œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜๊ณผ ์ •๋ณด ๊ฒ€์ƒ‰์„ ๋‹ค๋ฃธ์œผ๋กœ์จ, MAS ์„ฑ๋Šฅ ์Šค์ผ€์ผ๋ง ๋ฒ•์น™์— ๋Œ€ํ•œ ๋‹ค์–‘ํ•œ ๊ตฌํ˜„ ๋ฐฉ์‹์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Sparks of science ๋…ผ๋ฌธ์€ ์—์ด์ „ํŠธ ํ™•์žฅ๋ณด๋‹ค๋Š” ๊ตฌ์กฐ์  ํŒจํ„ด ๊ธฐ๋ฐ˜ ์ฐฝ์˜์„ฑ ์ฆ์ง„์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ์ƒํ˜ธ๋ณด์™„๋œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋Œ€๊ทœ๋ชจ AI ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ์„ฑ๋Šฅ ๋ฐ ํ™•์žฅ์„ฑ, ์‹ ๋ขฐ์„ฑ์— ๊ด€ํ•œ ์ฒด๊ณ„์  ๋…ผ์˜๋ฅผ ํ† ๋Œ€๋กœ ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ ํ‰๊ฐ€์˜ ํ•„์š”์„ฑ์„ ๋ถ€๊ฐํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์—์ด์ „ํŠธ ํ™•์žฅ ์›์น™์„ ์ •๋Ÿ‰ํ™”ํ•˜๋Š” ์—ฐ๊ตฌ๋Š” AI ์—์ด์ „ํŠธ ์‹ ๋ขฐ์„ฑ ๊ณผํ•™์ด ์ œ์‹œํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์„ฑ๋Šฅ ์˜ˆ์ธก ์ฐจ์›์—์„œ ํ™•์žฅํ•˜๋Š” ์ƒํ˜ธ ๋ณด์™„์  ์—ฐ๊ตฌ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ์„ฑ๋Šฅ ํ™•์žฅ ์›๋ฆฌ๋ฅผ ์‹ค์ œ MAS(Multi-Agent System)์— ๋Œ€ํ•ด ์ •๋Ÿ‰์ ์œผ๋กœ ๋ถ„์„ํ•œ ์‚ฌ๋ก€๋กœ, ์ด๋ก ์  ๋ฆฌ๋ทฐ๊ฐ€ ์‹ค์ฆ์—ฐ๊ตฌ๋กœ ์ด์–ด์ง„๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
๋‹ค์ค‘ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์˜ ๋ฆฌ์Šคํฌ ๋ฐ ์œ„ํ—˜์„ฑ ์—ฐ๊ตฌ๋ฅผ ๋Œ€์กฐ์ ์œผ๋กœ ์ œ์‹œํ•˜์—ฌ ํ™•์žฅ ์‹œ ์„ฑ๋Šฅ ์ €ํ•˜/๋ถˆ์•ˆ์ •์„ฑ ๋ฌธ์ œ๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •