Autoreproduce: Automatic AI Experiment Reproduction with Paper Lineage

์ €์ž: Xuanle Zhao, Zilin Sang, Yuxuan Li, Qi Shi, Wei Zhao | ๋‚ ์งœ: 2025 | DOI: arXiv:2505.20662v2 📄 PDF


Essence

Figure 1

Figure 1: The paper content, instructions and data processing code (if necessary) are provided for each

๋ณธ ๋…ผ๋ฌธ์€ ์—ฐ๊ตฌ ๋…ผ๋ฌธ์˜ ์‹คํ—˜์„ ์ž๋™์œผ๋กœ ์žฌํ˜„ํ•˜๊ธฐ ์œ„ํ•ด paper lineage ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ multi-agent ํ”„๋ ˆ์ž„์›Œํฌ์ธ AUTOREPRODUCE๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด ๋ฐฉ๋ฒ•๋“ค์ด ๋ถ€๋ถ„์ ์ธ ์ž‘์—…๋งŒ ์ž๋™ํ™”ํ–ˆ๋˜ ๊ฒƒ๊ณผ ๋‹ฌ๋ฆฌ, ๋ณธ ๋ฐฉ์‹์€ end-to-end ์‹คํ—˜ ์žฌํ˜„์„ ์ˆ˜ํ–‰ํ•˜๋ฉฐ ์ƒ์„ฑ๋œ ์ฝ”๋“œ์˜ ์‹คํ–‰์„ฑ๊นŒ์ง€ ๊ฒ€์ฆํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: The paper content, instructions and data processing code (if necessary) are provided for each

๊ธฐ์ˆ ์  ์„ฑ๊ณผ: AUTOREPRODUCE๋Š” REPRODUCEBENCH์˜ 5๊ฐœ ํ‰๊ฐ€ ์ง€ํ‘œ ์ „์ฒด์—์„œ ๊ธฐ์กด agent baseline์„ ์ตœ๋Œ€ 70% ์ด์ƒ ์ดˆ๊ณผ ๋‹ฌ์„ฑ. ์‹คํ–‰ ์„ฑ๊ณผ: ๊ณต์‹ ๊ตฌํ˜„๊ณผ ๋น„๊ตํ•˜์—ฌ 89.74%์˜ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์‹คํ—˜์—์„œ ํ‰๊ท  22.1%์˜ ์„ฑ๋Šฅ ๊ฒฉ์ฐจ๋งŒ ๋ฐœ์ƒ. ํ‰๊ฐ€ ์ฒด๊ณ„: 13๊ฐœ ๋…ผ๋ฌธ์„ ํฌํ•จํ•œ REPRODUCEBENCH์™€ ์žฌํ˜„ ๋ฐ ์‹คํ–‰ ์ถฉ์‹ค๋„๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ๋‹ค์ธต ์ง€ํ‘œ ๋„์ž….

How

Figure 1

Figure 1: The paper content, instructions and data processing code (if necessary) are provided for each

โ€ข Paper lineage ์•Œ๊ณ ๋ฆฌ์ฆ˜: ์ธ์šฉ ๊ทธ๋ž˜ํ”„์™€ ๊ด€๋ จ ์ฝ”๋“œ ์ €์žฅ์†Œ๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๋„๋ฉ”์ธ ํŠนํ™” ํ•ฉ์˜์™€ ๊ตฌํ˜„ ๊ด€ํ–‰ ์‹๋ณ„

โ€ข ์„ธ ๋‹จ๊ณ„ ์š”์•ฝ ํ”„๋กœ์„ธ์Šค: ์ „์ฒด ๋‚ด์šฉ ์š”์•ฝ โ†’ ๋ฐฉ๋ฒ• ์ƒ์„ธ โ†’ ์‹คํ—˜ ์„ค์ • ์ˆœ์œผ๋กœ ์ •๋ณด ์ถ”์ถœ

โ€ข Mineru ๊ธฐ๋ฐ˜ PDF ์ฒ˜๋ฆฌ: ์ˆ˜์‹, ํ…Œ์ด๋ธ” ๋“ฑ ์ค‘์š” ์ •๋ณด ๋ณด์กด

โ€ข ๋‹จ์œ„ ํ…Œ์ŠคํŠธ ์ƒ์„ฑ: batch sampling์„ ํ†ตํ•ด ์ฝ”๋“œ ์‹คํ–‰์„ฑ ๊ฒ€์ฆ

โ€ข ๋‹ค์ค‘ ์—์ด์ „ํŠธ ํ˜‘๋ ฅ: ์—ฐ๊ตฌ ์—์ด์ „ํŠธ(ํ…์ŠคํŠธ ์ž‘์—…)์™€ ์ฝ”๋“œ ์—์ด์ „ํŠธ(์ฝ”๋“œ ์ž‘์—…) ์—ญํ•  ๋ถ„๋‹ด

Originality

โ€ข Paper lineage ๊ฐœ๋…์˜ ๋„์ž…: ์ธ์šฉ ๊ด€๊ณ„ ๊ธฐ๋ฐ˜ ์•”๋ฌต์  ๋„๋ฉ”์ธ ์ง€์‹ ์ถ”์ถœ ๋ฐฉ์‹์€ ์ƒˆ๋กœ์šด ์ ‘๊ทผ๋ฒ•

โ€ข End-to-end ์ž๋™ ์žฌํ˜„ ํ”„๋ ˆ์ž„์›Œํฌ: ๊ธฐ์กด์˜ ๋ถ€๋ถ„์  ์ž๋™ํ™”๋ฅผ ๋„˜์–ด ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ ๊ตฌ์ถ•

โ€ข ์‹คํ–‰์„ฑ ๊ฒ€์ฆ ํฌํ•จ: ์ฝ”๋“œ์˜ ์ƒ์„ฑ๋ฟ๋งŒ ์•„๋‹ˆ๋ผ ์‹คํ–‰ ๊ฐ€๋Šฅ์„ฑ๊ณผ ์ถฉ์‹ค๋„ ํ‰๊ฐ€ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ์ถ”๊ฐ€

โ€ข ํฌ๊ด„์  ํ‰๊ฐ€ ๋ฒค์น˜๋งˆํฌ: 13๊ฐœ ๋…ผ๋ฌธ์˜ ์ˆ˜๋™ ๊ฒ€์ฆ๋œ ์ฐธ์กฐ ์ฝ”๋“œ์™€ ๋‹ค์ธต ์ง€ํ‘œ๋กœ ๊ตฌ์„ฑ

Limitation & Further Study

โ€ข ๋ฒค์น˜๋งˆํฌ ๊ทœ๋ชจ: 13๊ฐœ ๋…ผ๋ฌธ๋งŒ ํฌํ•จ๋˜์–ด ๋‹ค์–‘ํ•œ AI ๋„๋ฉ”์ธ์— ๋Œ€ํ•œ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์ด ์ œํ•œ์ 

โ€ข Paper lineage์˜ ๋ฒ”์œ„: ์ธ์šฉ ๊ด€๊ณ„ ๋ถ„์„์ด ๋งค์šฐ ๊นŠ์€ ๋„๋ฉ”์ธ ์ง€์‹(์˜ˆ: ํŠน์ • ๋ถ„์•ผ์˜ ๋น„๊ณต์‹ ๊ด€๋ก€)์„ ๋ชจ๋‘ ํฌํ•จํ•˜์ง€ ๋ชปํ•  ๊ฐ€๋Šฅ์„ฑ

โ€ข ์ฝ”๋“œ ํ’ˆ์งˆ ํ‰๊ฐ€: 22.1% ์„ฑ๋Šฅ ๊ฒฉ์ฐจ๋Š” ์—ฌ์ „ํžˆ ์ƒ๋‹นํ•˜๋ฉฐ, ํŠน์ • ๋ณต์žกํ•œ ๊ตฌํ˜„์—์„œ์˜ ํ•œ๊ณ„ ๋ฏธ๊ฒ€ํ† 

โ€ข ํ›„์† ์—ฐ๊ตฌ: (1) ๋” ํฐ ๊ทœ๋ชจ์˜ ๋…ผ๋ฌธ ์ง‘ํ•ฉ์œผ๋กœ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ๊ฒ€์ฆ, (2) ํŠน์ • ๋„๋ฉ”์ธ์— ๋Œ€ํ•œ paper lineage ๋ฐฉ๋ฒ•์˜ ์ตœ์ ํ™”, (3) ์„ฑ๋Šฅ ๊ฒฉ์ฐจ ๊ฐ์†Œ๋ฅผ ์œ„ํ•œ ์ถ”๊ฐ€ ์„ธ๋ฐ€ ์กฐ์ • ๋ฉ”์ปค๋‹ˆ์ฆ˜

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ paper lineage๋ผ๋Š” ํ˜์‹ ์  ๊ฐœ๋…๊ณผ multi-agent ๊ธฐ๋ฐ˜์˜ end-to-end ์‹คํ—˜ ์žฌํ˜„ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ AI ์—ฐ๊ตฌ์˜ ์žฌํ˜„์„ฑ ๋ฌธ์ œ๋ฅผ ์‹ค์งˆ์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋ ค๋Š” ๊ฐ€์น˜ ์žˆ๋Š” ์‹œ๋„์ด๋‹ค. ์‹คํ–‰์„ฑ ๊ฒ€์ฆ์„ ํฌํ•จํ•œ ํฌ๊ด„์  ํ‰๊ฐ€ ๋ฐฉ์‹๊ณผ ์šฐ์ˆ˜ํ•œ ์‹คํ—˜ ๊ฒฐ๊ณผ๊ฐ€ ๊ฐ•์ ์ด๋‚˜, ๋ฒค์น˜๋งˆํฌ ๊ทœ๋ชจ์™€ ๋„๋ฉ”์ธ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ์ธก๋ฉด์—์„œ ๊ฐœ์„ ์ด ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
581์˜ ๋Œ€๊ทœ๋ชจ ์—ฐ๊ตฌ ๊ด€๋ จ ๋ฐ์ดํ„ฐ์…‹์€ 145์˜ ๋…ผ๋ฌธ ๊ณ„๋ณด ๊ธฐ๋ฐ˜ ์ž๋™ํ™” ๋ฐฉ๋ฒ•๋ก ์— ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Deep Research Agent์˜ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ ์„ค๊ณ„์— ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
145๋Š” LLM์˜ ๊ตฌ์กฐ์  ์ถ”๋ก  ๋Šฅ๋ ฅ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ์ด๋ก ์ ยท๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•˜์—ฌ GraphInstruct ์„ค๊ณ„์— ํ™œ์šฉ๋œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
145๋ฒˆ ๋…ผ๋ฌธ์€ AI ๊ธฐ๋ฐ˜ ์‹คํ—˜ ์žฌํ˜„ ์ž๋™ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ด, 925๋ฒˆ ์žฌํ˜„์„ฑ ์œ„๊ธฐ ํ˜„์ƒ์— ๋Œ€ํ•œ ์†”๋ฃจ์…˜์  ์‹œ๊ฐ์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Autokaggle ๋…ผ๋ฌธ์€ ์ž๋™ ์žฌํ˜„/์‹คํ—˜ ์„ธํŒ… ์ž๋™ํ™”์— ์ดˆ์ ์„ ๋งž์ถ”๋Š” ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ, paper-lineage ๋Œ€์‹  workflow ์ค‘์‹ฌ์œผ๋กœ ์ ‘๊ทผํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ์—ฐ๊ตฌ ๋…ผ๋ฌธ์˜ ์ž๋™ ์žฌํ˜„ ๋ฐ ์ฝ”๋“œ ์ƒ์„ฑ์„ ์œ„ํ•œ ์œ ์‚ฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ•™์ˆ  ๋…ผ๋ฌธ ๊ตฌํ˜„ ์ž๋™ํ™”๋ฅผ ์œ„ํ•œ LLM ๊ธฐ๋ฐ˜ ์‹œ์Šคํ…œ์˜ ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
326๋ฒˆ ๋…ผ๋ฌธ์€ AI ์—ฐ๊ตฌ์ž‘์—…์˜ ์ž๋™ํ™” ๊ฐ€๋Šฅ์„ฑ์„ ์‹คํ—˜์  ๊ด€์ ์—์„œ ๊ฒ€์ฆํ•˜๋ฏ€๋กœ 145๋ฒˆ์˜ ๋…ผ๋ฌธ ๊ณ„๋ณด ๊ธฐ๋ฐ˜ ์ž๋™ ์žฌํ˜„ ์‹œ์Šคํ…œ๊ณผ ์‹ฌ์ธต์ ์œผ๋กœ ๋Œ€์กฐํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ๊ธฐ๋ฐ˜ ๊ณผํ•™ ์‹คํ—˜ ์žฌํ˜„ ์ž๋™ํ™”์— ๊ด€ํ•œ ์—ฐ๊ตฌ๋กœ, ์งˆ์˜์‘๋‹ต(QA)์ด ์•„๋‹Œ ๋ณต์žกํ•œ ์‹คํ—˜์  ๊ณผํ•™์  ์ž‘์—…์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ ๊ด€์ ์—์„œ ๋Œ€๊ตฌ(MedBioLM) ์ ‘๊ทผ์„ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋…ผ๋ฌธ์œผ๋กœ๋ถ€ํ„ฐ ์‹คํ–‰ ๊ฐ€๋Šฅํ•œ ์ฝ”๋“œ๋ฅผ ์ž๋™ ์ƒ์„ฑํ•˜๋Š” ์œ ์‚ฌํ•œ ๋ชฉํ‘œ์˜ ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ณผํ•™ ์‹คํ—˜ ์žฌํ˜„์„ฑ ๋ฐ ์ž๋™ํ™” ์ฝ”๋“œ ์ƒ์„ฑ์˜ ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์ธ์šฉ ๊ด€๊ณ„ ๋ถ„์„๊ณผ ์ฝ”๋“œ ์žฌํ˜„์„ ์—ฐ๊ฒฐํ•˜๋Š” ์œ ์‚ฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
698์€ ๋Œ€๊ทœ๋ชจ ์‹คํ—˜์˜ ์žฌํ˜„์„ฑ ๋ณด์žฅ์„ ์œ„ํ•œ AI ์ง€์› ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์ œ์•ˆํ•˜๋ฉฐ, 145์˜ ์ž๋™ ์žฌํ˜„ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋Œ€์กฐ์ ์œผ๋กœ ํ˜„์‹ค์  ์ ์šฉ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃฌ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
828์€ AI ์—ฐ๊ตฌ์˜ End-to-End ์ž๋™ํ™”๋กœ ํ™•์žฅํ•˜๋Š” ๋ฐฉ์•ˆ์„ ์ œ์‹œํ•˜์—ฌ, 145์˜ ์ž๋™ ์‹คํ—˜ ์žฌํ˜„์„ ๋” ํฌ๊ด„์ ์ธ ์—ฐ๊ตฌ ์ž๋™ํ™”๋กœ ๋ฐœ์ „์‹œํ‚จ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
594๋ฒˆ ๋…ผ๋ฌธ์€ ๋ชจ๋ธ์„ค๊ณ„ ์ž๋™ํ™”์—์„œ LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ ํ™œ์šฉ ์‹ค์ฆ์„ ๋ณด์—ฌ์ฃผ์–ด, 145๋ฒˆ์˜ ์‹คํ—˜ ๋‹จ์ˆœ ๋ฐ˜๋ณต์„ฑ์„ ์‹ค์งˆ์  ํ˜์‹ ๊ณผ ํ†ตํ•ฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
145 ๋…ผ๋ฌธ์€ ๋…ผ๋ฌธ ๊ณ„๋ณด๋ฅผ ํ™œ์šฉํ•œ ์ž๋™ํ™” AI ์‹คํ—˜ ์žฌํ˜„์˜ ๊ตฌ์ฒด์  ์‚ฌ๋ก€๋กœ, 2199๋ฒˆ์—์„œ ์ฃผ์žฅํ•˜๋Š” foundation model ๊ธฐ๋ฐ˜ ๊ณผํ•™ ํŒจ๋Ÿฌ๋‹ค์ž„ ํ˜์‹ ์„ ์‹ค์ฆ์ ์œผ๋กœ ๋ณด์—ฌ์ค€๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •