Mind the blind spots: A focus-level evaluation framework for llm reviews

์ €์ž: Hyungyu Shin, Jingyu Tang, Yoonjoo Lee, Nayoung Kim, Hyunseung Lim, Ji Yong Cho, Hwajung Hong, Moontae Lee, Ju-ho Kim | ๋‚ ์งœ: 2025 | URL: https://arxiv.org/abs/2502.17086 📄 PDF


Essence

Figure 1

Figure 1: We introduce a focus-level evaluation frame-

LLM์ด ์ƒ์„ฑํ•œ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๊ฐ€ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€์™€ ๋™์ผํ•œ ํ•ต์‹ฌ ์ธก๋ฉด์— ์ฃผ๋ชฉํ•˜๋Š”์ง€ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด focus-level ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, ์ž๋™ ์ฃผ์„ ์ฒ˜๋ฆฌ๋ฅผ ํ†ตํ•ด LLM์˜ blind spot์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•œ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4: A visualization of focus distributions by target/aspect and strength/weakness, in a descending order of

How

Figure 2

Figure 2: The overall process of automated focus-level evaluation. We first extracted strengths and weaknesses

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: LLM ๋ฆฌ๋ทฐ ํ‰๊ฐ€์— ์ƒˆ๋กœ์šด ์ฐจ์›์„ ์ œ์‹œํ•œ ์›์ฐฝ์  ์—ฐ๊ตฌ๋กœ, ์ž๋™ ํ‰๊ฐ€ ํŒŒ์ดํ”„๋ผ์ธ์„ ํ†ตํ•ด LLM์˜ ๊ตฌ์กฐ์  ๋งน์ ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋“œ๋Ÿฌ๋‚ด๋ฉฐ ํ•™์ˆ  ๋ฆฌ๋ทฐ ๊ณผ์ •์—์„œ LLM ํ™œ์šฉ ๋ฐฉ์•ˆ์— ์‹ค์งˆ์  ์ง€์นจ์„ ์ œ๊ณตํ•œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
870๋ฒˆ ๋…ผ๋ฌธ์€ ํ…์ŠคํŠธ ๋งค์นญ์— ๊ธฐ๋ฐ˜ํ•œ ๋ฆฌ๋ทฐ ํ’ˆ์งˆ ํ‰๊ฐ€ ์ทจ์•ฝ์„ฑ์„ ์ง€์ ํ•˜์—ฌ, 537๋ฒˆ ๋…ผ๋ฌธ์˜ ์ž๋™ ์ฃผ์„ ์ฒ˜๋ฆฌ ๊ธฐ๋ฐ˜ ํ‰๊ฐ€์˜ ํ•œ๊ณ„์™€ ์žฅ์ ์„ ํ˜„์‹ค์ ์œผ๋กœ ์กฐ๋ช…ํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
537๋ฒˆ์˜ focus-level ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ๋Š” 628๋ฒˆ์ด ์ฃผ์žฅํ•œ ํ”ผ์–ด ๋ฆฌ๋ทฐ ํ’ˆ์งˆ ์ €ํ•˜ ๋ฌธ์ œ ํ•ด๊ฒฐ์˜ ์‹ฌ์ธต์  ์ง„๋‹จ ๋ฐ ํ‰๊ฐ€ ๋ชจ๋ธ๋กœ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
537์€ LLM ๋ฆฌ๋ทฐ์˜ ์ดˆ์ -์ˆ˜์ค€(focus-level) ํ‰๊ฐ€์™€ ๋ธ”๋ผ์ธ๋“œ ์ŠคํŒŸ ๋ฌธ์ œ์— ๊ด€ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, 128์˜ ๋ฆฌ๋ทฐ ํŽธํ–ฅ ๋ถ„์„์˜ ์ด๋ก ์  ํ† ๋Œ€๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
592๋ฒˆ ๋…ผ๋ฌธ์€ ์ „๋ฌธ๊ฐ€ ๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ๋กœ ํŒŒ์ธํŠ ๋œ LLM์„ ์ง์ ‘ ์ƒ์„ฑยทํ‰๊ฐ€ํ•˜์—ฌ, LLM ๋ฆฌ๋ทฐ์˜ ํ’ˆ์งˆ์„ ์ค‘์ ์ ์œผ๋กœ ๋ถ„์„ํ•˜๋ฉฐ blind spot ์ธก๋ฉด์˜ ๋Œ€์•ˆ์  ์‹œ๊ฐ์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM ๊ธฐ๋ฐ˜ ํ•™์ˆ  ๋ฌธ์„œ ์š”์•ฝ ๋ฐ ์ข…ํ•ฉ์„ ๋‹ค๋ฅธ ์ ‘๊ทผ๋ฒ•์œผ๋กœ ๊ตฌํ˜„ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์‹œ๊ฐ-ํ…์ŠคํŠธ ํ†ตํ•ฉ ๋ถ„์„์„ ํฌํ•จํ•œ ์ž๋™ ๋…ผ๋ฌธ ํ‰๊ฐ€ ์‹œ์Šคํ…œ์„ ๋‹ค๋ฃจ๋Š” ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM ํ”„๋กฌํ”„ํŠธ ์กฐ์ • ๋ฐ ๋ฆฌ๋ทฐ ํ’ˆ์งˆ ํ–ฅ์ƒ์„ ๋‹ค๋ฃฌ ๋…ผ๋ฌธ์œผ๋กœ, focus-level ํ‰๊ฐ€์™€ prompt engineering์˜ ์ƒํ˜ธ๋ณด์™„์„ฑ์„ ๊ฒ€ํ† ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
680 ๋…ผ๋ฌธ์€ LLM ๋ฆฌ๋ทฐ๊ฐ€ ์‹ค์ œ๋กœ ์น˜๋ช…์  ๋ฌธ์ œ๋ฅผ ๋†“์น˜์ง€ ์•Š๋Š”์ง€, ์ด์œ ๊ธฐ๋ฐ˜ ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ๋ธ”๋ผ์ธ๋“œ ์ŠคํŒŸ(537) ํƒ์ง€์™€ ๋‹ค๋ฅธ ๊ด€์ ์˜ ๋น„ํŒ์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
REMOR๋Š” LLM ๊ธฐ๋ฐ˜ ๋…ผ๋ฌธ ์‹ฌ์‚ฌํ‰ ์ƒ์„ฑ์—์„œ ๋‹ค๋ชฉ์  ๊ฐ•ํ™”ํ•™์Šต์„ ํ™œ์šฉํ•ด ์‹ฌ์ธต์ ยท๊ท ํ˜•์žกํžŒ ํ”ผ๋“œ๋ฐฑ์„ ์ง€ํ–ฅํ•˜๋ฉฐ, LLM ๋ฆฌ๋ทฐ์˜ ํ‰๊ฐ€ ๋ฐ ๊ฐœ์„  ์ธก๋ฉด์—์„œ ์œ ์‚ฌ ์ฃผ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Automatically evaluating the paper reviewing capability of llms๋Š” LLM ๋ฆฌ๋ทฐ ๋Šฅ๋ ฅ ํ‰๊ฐ€์—์„œ ๋‹ค๋ฅธ ํ‰๊ฐ€ ์ง€ํ‘œ์™€ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ด, ์ธก๋ฉด๋ณ„ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•๋ก ์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
537๋ฒˆ์€ LLM ๋ฆฌ๋ทฐ์˜ ํŠน์„ฑ ๋ถ„์„ยทํ‰๊ฐ€ ์ฒด๊ณ„๋ฅผ ์ œ์•ˆํ•˜์—ฌ 591๋ฒˆ์—์„œ ๊ฐ•์กฐํ•œ OpenReview๋ฅผ ํ™œ์šฉํ•œ AI ๊ธฐ๋ฐ˜ ํ‰๊ฐ€์™€ ์ƒํ˜ธ ๋ณด์™„์ ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
628๋ฒˆ ๋…ผ๋ฌธ์€ AI ํ•™์ˆ ๋Œ€ํšŒ ๋ฆฌ๋ทฐ ํ’ˆ์งˆ ์œ„๊ธฐ์˜ ๊ฐœํ˜๋ฐฉ์•ˆ(์–‘๋ฐฉํ–ฅ ํ”ผ๋“œ๋ฐฑ, ๋ณด์ƒ ๋“ฑ)์„ ์ฃผ์žฅํ•ด, ๋ฆฌ๋ทฐ ํ‰๊ฐ€์˜ ๊ทผ๋ณธ์  ํ•ด๊ฒฐ์ฑ…์„ ๋…ผ์˜ํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
679(ReviewEval)์€ AI ๊ธฐ๋ฐ˜ ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ ํ‰๊ฐ€ ์š”์†Œ ํ”„๋ ˆ์ž„์„ ํ™•์žฅํ•ด, 537์˜ ์ดˆ์  ์ˆ˜์ค€(ํฌ์ปค์Šค ๋ ˆ๋ฒจ) ํ‰๊ฐ€ ์ฒด๊ณ„๋ฅผ ๋‹ค์ฐจ์›์ ์œผ๋กœ ๊ตฌ์ฒดํ™”ํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ReviewAgents๋Š” AI ๋ฆฌ๋ทฐ๊ฐ€ ์ธ๊ฐ„ ๋ฆฌ๋ทฐ์–ด์™€ ์œ ์‚ฌํ•œ ์ค‘์š”ํ•œ ๋ฌธ์ œ๋ฅผ ํฌ์ฐฉํ•˜๋Š”์ง€ ๋ถ„์„ํ•ด, LLM ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์‹ค์ œ ์‘์šฉ ์‚ฌ๋ก€๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
654๋ฒˆ ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์ผ๊ด€์„ฑ ๋ณด์žฅ๋œ ํ”ผ์–ด๋ฆฌ๋ทฐ ๋ฐ์ดํ„ฐ์…‹์„ ๊ตฌ์ถ•ํ•˜์—ฌ, 537๋ฒˆ์ด ์ œ์•ˆํ•œ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ ์ ์šฉ์— ์ ํ•ฉํ•œ ์‹ค๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •