DREAM: Deep Research Evaluation with Agentic Metrics

์ €์ž: E. Avraham, Changhao Li, R. Dorfman, Roy Ganz, Oren Nuriel, Amir Dudai, Aviad Aberdam, Noah R. Flynn, Elman Mansimov, Aditya Kalyanpur, Ron Litman | ๋‚ ์งœ: 2026 | DOI: 10.48550/arXiv.2602.18940 📄 PDF


Essence

Figure 1

Figure 1: Capturing Overlooked Dimensions of Research Quality. DREAM actively verifies the reasoning of

Deep Research Agents๊ฐ€ ์ƒ์„ฑํ•œ ๋ถ„์„๊ฐ€๊ธ‰ ๋ณด๊ณ ์„œ ํ‰๊ฐ€์˜ ํ•ต์‹ฌ ๋ฌธ์ œ์ธ 'Mirage of Synthesis'๋ฅผ ์‹๋ณ„ํ•˜๊ณ , ๋Šฅ๋ ฅ ๊ท ํ˜• ์›์น™์— ๊ธฐ๋ฐ˜ํ•œ DREAM ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ agentic evaluation์œผ๋กœ ์‹œ๊ฐ„ ๋ฏผ๊ฐ๋„์™€ ์‚ฌ์‹ค์„ฑ์„ ํšจ๊ณผ์ ์œผ๋กœ ๊ฒ€์ฆํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: Capturing Overlooked Dimensions of Research Quality. DREAM actively verifies the reasoning of

How

Figure 2

Figure 2: DREAM Overview. Our framework operates in two phases. Left: Protocol Creation, where query-

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ด ๋…ผ๋ฌธ์€ Deep Research Evaluation์˜ ๊ทผ๋ณธ์  ๋ฌธ์ œ๋ฅผ 'Mirage of Synthesis'๋กœ ๋ช…๋ช…ํ•˜๊ณ  capability parity ์›์น™์— ๊ธฐ๋ฐ˜ํ•œ DREAM ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ํ•ด๊ฒฐํ•˜์—ฌ, ๊ธฐ์กด ๋ฒค์น˜๋งˆํฌ์˜ ๋งน์ ์„ ๊ตฌ์ฒด์  ๋ฐ์ดํ„ฐ๋กœ ์ž…์ฆํ•จ์œผ๋กœ์จ ํ‰๊ฐ€ ํŒจ๋Ÿฌ๋‹ค์ž„์˜ ํ˜์‹ ์  ์ „ํ™˜์„ ์ œ์‹œํ•œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Deep Research Agent์˜ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ ์„ค๊ณ„์— ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ์ƒ์„ฑ ๋ถ„์„ ๋ณด๊ณ ์„œ์˜ ์‚ฌ์‹ค์„ฑ ๊ฒ€์ฆ ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃจ๋Š” ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ์—์ด์ „ํŠธ๊ฐ€ ์ƒ์„ฑํ•œ ๋ณด๊ณ ์„œ์˜ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•๋ก ๊ณผ ํ’ˆ์งˆ ๊ฒ€์ฆ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๋‹ค๋ฃจ๋Š” ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM ๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ ํ‰๊ฐ€์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์‚ฌ์‹ค์„ฑ ๊ฒ€์ฆ์„ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ ๊ฐœ๋ฐœ์— ๊ด€ํ•œ ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM ๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ ํ’ˆ์งˆ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜๋Š” ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ์ƒ์„ฑ ๋ถ„์„ ๋ณด๊ณ ์„œ์˜ ํ‰๊ฐ€ ๊ธฐ์ค€๊ณผ ๋ฐฉ๋ฒ•๋ก ์  ๋„์ „์„ ๋‹ค๋ฃจ๋Š” ์œ ์‚ฌํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ทจํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM์ด ๊ณผํ•™ ์—ฐ๊ตฌ ์‚ฐ์ถœ๋ฌผ์„ ์ƒ์„ฑํ•˜๊ณ  ํ‰๊ฐ€ํ•˜๋Š” ์ธํ„ฐํŽ˜์ด์Šค๋กœ์„œ์˜ ์—ญํ• ์„ ํƒ๊ตฌํ•˜๋Š” ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM์˜ ๊ทธ๋ž˜ํ”„ ๊ด€๋ จ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋Š” ๋ฒค์น˜๋งˆํฌ๋กœ ๋™์ผํ•œ ๋ฌธ์ œ ์˜์—ญ์„ ๋‹ค๋ฃจ๋Š” ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •