Lag: Llm agents for leaderboard auto generation on demanding

์ €์ž: Jian Wu, Jiayu Zhang, Dongyuan Li, Linyi Yang, Aoxiao Zhong, Renhe Jiang, Qingsong Wen, Yue Zhang | ๋‚ ์งœ: 2025 | URL: https://arxiv.org/abs/2502.18209 📄 PDF


Essence

Figure 2

Figure 2: The League framework for leaderboard automatic generation. In Stage 1, we automatically

League๋Š” arXiv์™€ ํ•™์ˆ ์ง€์—์„œ ์ž๋™์œผ๋กœ ๋…ผ๋ฌธ์„ ์ˆ˜์ง‘ํ•˜์—ฌ LLM ๊ธฐ๋ฐ˜์œผ๋กœ ์‹คํ—˜ ๊ฒฐ๊ณผ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ํ†ตํ•ฉํ•จ์œผ๋กœ์จ ๋™์ ์œผ๋กœ ๋ฆฌ๋”๋ณด๋“œ๋ฅผ ์ž๋™ ์ƒ์„ฑํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ์ด๋‹ค.

Motivation

Achievement

Figure 3

Figure 3: The example leaderboard generated by League. Comparing with the Leaderboard of

How

Figure 2

Figure 2: The League framework for leaderboard automatic generation. In Stage 1, we automatically

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: League๋Š” ๊ธ‰์ฆํ•˜๋Š” ํ•™์ˆ  ๋…ผ๋ฌธ์— ๋Œ€์‘ํ•˜์—ฌ ์ž๋™์œผ๋กœ ์ตœ์‹  ๋ฆฌ๋”๋ณด๋“œ๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ˜์‹ ์  ํ”„๋ ˆ์ž„์›Œํฌ์ด๋ฉฐ, ์‹คํ—˜ ์„ค์ •์„ ํฌํ•จํ•œ ๊ณต์ •ํ•œ ๋น„๊ต๋ผ๋Š” ์ƒˆ๋กœ์šด ๊ด€์ ์„ ์ œ์‹œํ•œ๋‹ค. ์ธ๊ฐ„ ์„ฑ๋Šฅ์— ๊ทผ์ ‘ํ•œ ๊ฒฐ๊ณผ์™€ 5-10๋ฐฐ์˜ ํšจ์œจ์„ฑ ํ–ฅ์ƒ์œผ๋กœ ์‹ค์งˆ์  ๊ฐ€์น˜๋ฅผ ์ž…์ฆํ•˜๋‚˜, LLM ์˜ค๋ฅ˜ ์ฒ˜๋ฆฌ ๋ฐ ๋‹ค๋ถ„์•ผ ์ผ๋ฐ˜ํ™” ๊ฐœ์„ ์ด ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
GPT-4 Technical Report๋Š” ๋ฆฌ๋”๋ณด๋“œ ๋“ฑ LLM ์„ฑ๋Šฅ ์ธก์ • ๋ฐ ์ž๋™ํ™”์— ํ•„์ˆ˜์ ์ธ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํฌ๋กœ์„œ LAG์˜ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ๋ฐ ํ‰๊ฐ€์ž๋ฃŒ๋กœ ์ง๊ด€์  ์—ฐ๊ด€์ด ์žˆ์Šต๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ํ•™์ˆ  ์„œ๋ฒ ์ด ์ž๋™ํ™” ๋ฐฉ๋ฒ•์œผ๋กœ, ๋ฆฌ๋”๋ณด๋“œ ์ž๋™ ์ƒ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ ๊ตฌ์ถ•์˜ ๊ธฐ์ˆ ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ž๋™ํ™”๋œ ์‹คํ—˜ ๊ฒฐ๊ณผ ์ถ”์ถœ ๋ฐ ๋ฆฌ๋”๋ณด๋“œ ์ƒ์„ฑ์˜ ๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•˜๋Š” ์„ ํ–‰ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
603์€ ์—ฐ๊ตฌ ์•„์ด๋””์–ด ์ดˆ์•ˆ ์ž๋™์ƒ์„ฑ ์‹œ์Šคํ…œ์œผ๋กœ, 1088๊ณผ ๊ฐ™์ด ๋…ผ๋ฌธ ๋ฐ์ดํ„ฐ ์ž๋™ ์ฒ˜๋ฆฌ๋ฅผ ๋ชฉํ‘œ๋กœ ํ•˜์ง€๋งŒ ํ˜์‹  ์ง€์ ์ด ๋‹ค๋ฅด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM์„ ํ™œ์šฉํ•œ ํ•™์ˆ  ๋…ผ๋ฌธ ์ •๋ณด ์ž๋™ ์ถ”์ถœ ๋ฐ ๋ฆฌ๋”๋ณด๋“œ ์ƒ์„ฑ์„ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ๊ตฌํ˜„ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋จธ์‹ ๋Ÿฌ๋‹ ์‹คํ—˜ ๊ฒฐ๊ณผ ์ž๋™ ์ˆ˜์ง‘ ๋ฐ ํ†ตํ•ฉ์„ ๋‹ค๋ฅธ ๋ฐฉ์‹์œผ๋กœ ์ ‘๊ทผํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM ๊ธฐ๋ฐ˜ ํ•™์ˆ  ๋…ผ๋ฌธ ๋ถ„์„ ๋ฐ ์ •๋ณด ์ถ”์ถœ์„ ๋‹ค๋ฅธ ๋งฅ๋ฝ์—์„œ ์ ์šฉํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‹ค์–‘ํ•œ RAG ๊ธฐ๋ฐ˜ ๋ฆฌ๋”๋ณด๋“œ ์ž๋™ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋น„๊ต ๋ถ„์„ํ•˜์—ฌ ์‹ฌ์ธต์  ๋ฒค์น˜๋งˆํ‚น ๋ฐ ๋ฐฉ๋ฒ•๋ก  ํ™•์žฅ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ๊ธฐ๋ฐ˜ ๋…ผ๋ฌธ ๋ฐ์ดํ„ฐ ๋ฐ ์„ฑ๊ณผ ํ‰๊ฐ€ ์˜คํ”ˆ ์—์ฝ”์‹œ์Šคํ…œ์„ ํ†ตํ•ด, League์˜ ์ž๋™ ๋ฆฌ๋”๋ณด๋“œ ์ƒ์„ฑ ๊ธฐ๋Šฅ์˜ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ๊ณผ ์‚ฌํšŒ์  ์˜ํ–ฅ๋ ฅ์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ•™์ˆ  ๋…ผ๋ฌธ์—์„œ ๊ฒฐ๊ณผ๋ฅผ ์ž๋™ ์ถ”์ถœํ•˜์—ฌ ๋น„๊ตํ•˜๋Š” ๋‹ค๋ฅธ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
108์€ RAG ๊ธฐ๋ฐ˜ ๋ชจ๋“ˆ์‹ ๋‹ค์ค‘๋ฌธ์„œ ์š”์•ฝ ํŒŒ์ดํ”„๋ผ์ธ์œผ๋กœ, 1088์˜ LLM ๊ธฐ๋ฐ˜ ๋ฆฌ๋”๋ณด๋“œ ์ž๋™์ƒ์„ฑ ๊ธฐ๋Šฅ ๊ตฌํ˜„์— ํ•„์š”ํ•œ ๋…ผ๋ฌธ ์ •๋ณด ์ถ”์ถœ ๋ฐ ์กฐ์งํ™”์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •