Scholawrite: A dataset of end-to-end scholarly writing process

์ €์ž: Linghe Wang, Minhwa Lee, R. Volkov, L. Chau, Dongyeop Kang | ๋‚ ์งœ: 2025 | DOI: N/A 📄 PDF


Essence

Figure 1

Figure 1: An example scholarly writing process with an-

์ด ๋…ผ๋ฌธ์€ ํ•™์ˆ  ๋…ผ๋ฌธ ์ž‘์„ฑ์˜ ์ „์ฒด ๊ณผ์ •์„ ์ถ”์ ํ•˜๋Š” SCHOLAWRITE๋ผ๋Š” ์ฒซ ๋ฐ์ดํ„ฐ์…‹์„ ์ œ์‹œํ•œ๋‹ค. Overleaf์—์„œ 4๊ฐœ์›”๊ฐ„ ์ˆ˜์ง‘ํ•œ ์•ฝ 62K๊ฐœ์˜ ํ…์ŠคํŠธ ๋ณ€๊ฒฝ ์‚ฌํ•ญ์„ ์ธ์ง€์  ์ž‘์„ฑ ์˜๋„๋กœ ์ฃผ์„ ์ฒ˜๋ฆฌํ•˜์—ฌ, ์ธ๊ฐ„์˜ ๋น„์„ ํ˜• ์ž‘์„ฑ ๊ณผ์ •๊ณผ ํ˜„์žฌ LLM์˜ ํ•œ๊ณ„๋ฅผ ๋ถ„์„ํ•œ๋‹ค.

Motivation

Achievement

Figure 3

Figure 3: The number of intentions per writing session

1. Overleaf Chrome extension ๊ฐœ๋ฐœ: ํ”„๋ผ์ด๋ฒ„์‹œ๋ฅผ ๋ณดํ˜ธํ•˜๋ฉด์„œ ์‹ค์‹œ๊ฐ„ keystroke์™€ ํ…์ŠคํŠธ diff๋ฅผ ์•ˆ์ „ํ•˜๊ฒŒ ๊ธฐ๋กํ•˜๋Š” ๋„๊ตฌ ์ œ๊ณต. 2. SCHOLAWRITE ๋ฐ์ดํ„ฐ์…‹: 5๊ฐœ์˜ ์ปดํ“จํ„ฐ ๊ณผํ•™ ํ”„๋ฆฌํ”„๋ฆฐํŠธ์—์„œ 4๊ฐœ์›”๊ฐ„ ์•ฝ 62K๊ฐœ์˜ keystroke ๊ธฐ๋ฐ˜ ํ…์ŠคํŠธ ๋ณ€๊ฒฝ ์‚ฌํ•ญ ์ˆ˜์ง‘ ๋ฐ ์ธ์ง€์  ์˜๋„๋กœ ์ฃผ์„ ์ฒ˜๋ฆฌ. 3. ์ธ์ง€ ์ด๋ก  ๊ธฐ๋ฐ˜ ๋ถ„๋ฅ˜๋ฒ•: 15๊ฐœ์˜ ์„ธ๋ถ„ํ™”๋œ ์ž‘์„ฑ ์˜๋„ ์นดํ…Œ๊ณ ๋ฆฌ๋ฅผ Flower and Hayes ์ด๋ก ์— ๊ธฐ๋ฐ˜ํ•˜์—ฌ ๊ตฌ์ถ•. 4. ์ธ๊ฐ„-LLM ๊ฐ„๊ทน ๋ถ„์„: ํ•™์ˆ  ์ž‘์„ฑ์ด ๊ณ ๋„๋กœ ๋น„์„ ํ˜•์ ์ด๋ฉฐ ๋‹ค์ค‘ ์˜๋„์ ์ž„์„ ๋ณด์—ฌ์ฃผ๊ณ , ํ˜„์žฌ LLM(GPT-5, Qwen)์ด ํ‘œ๋ฉด ์ˆ˜์ค€์˜ ํŽธ์ง‘๋งŒ ๋ชจ๋ฐฉํ•˜๊ณ  ๋‹ค์Œ ์˜๋„ ์˜ˆ์ธก์ด๋‚˜ ๋ณต์žกํ•œ ์ธ์ง€์  ๊ฐœ์ •์„ ์ง€์†ํ•˜์ง€ ๋ชปํ•จ์„ ์‹ค์ฆ์ ์œผ๋กœ ์ž…์ฆ.

How

Figure 2

Figure 2: Transition probability matrix between writing

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: SCHOLAWRITE๋Š” ํ•™์ˆ  ์ž‘์„ฑ์˜ end-to-end ๊ณผ์ •์„ ์‹ค์‹œ๊ฐ„์œผ๋กœ ํฌ์ฐฉํ•œ ์ตœ์ดˆ์˜ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ, ์ธ๊ฐ„์˜ ์ธ์ง€ ํ”„๋กœ์„ธ์Šค์™€ ํ˜„์žฌ LLM ๊ฐ„์˜ ๊ทผ๋ณธ์  ๊ฐ„๊ทน์„ ๊ทœ๋ช…ํ•œ๋‹ค. ๋ฐฉ๋ฒ•๋ก ์  ์—„๋ฐ€์„ฑ๊ณผ ์‹ค์šฉ์  ๊ฐ€์น˜๊ฐ€ ๋†’์œผ๋‚˜, ํ‘œ๋ณธ์˜ ์ œํ•œ์„ฑ๊ณผ ์ฃผ์„ ์ผ๊ด€์„ฑ ๊ฒ€์ฆ ๋ถ€์กฑ์ด ๊ฐœ์„  ํ•„์š” ์˜์—ญ์ด๋‹ค. ํ•™์ˆ  ์ž‘์„ฑ ์ง€์› ๋„๊ตฌ ๊ฐœ๋ฐœ๊ณผ ์ธ๊ฐ„-์ค‘์‹ฌ AI ์—ฐ๊ตฌ์— ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•  ๊ฒƒ์œผ๋กœ ๊ธฐ๋Œ€๋œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
CoAuthor ๋…ผ๋ฌธ์€ ์ธ๊ฐ„-LLM ๊ณต๋™ ๊ธ€์“ฐ๊ธฐ ๊ณผ์ •์˜ ๋ฐ์ดํ„ฐ๋ฅผ ๊ณต๊ฐœํ•˜๋ฉฐ, Scholawrite ๋ฐ์ดํ„ฐ์…‹์ด ์ถ”์ ํ•˜๋Š” ํ•™์ˆ  ์ €์ˆ  ํ–‰ํƒœ ๋ถ„์„์— ๊ธฐ์ดˆ์ž๋ฃŒ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ํ•™์ˆ  ๋…ผ๋ฌธ LLM ํ™œ์šฉ ํŒจํ„ด์— ๋Œ€ํ•œ arXiv ๋Œ€๊ทœ๋ชจ ๋ถ„์„์ด Scholawrite ์‹ค์ œ ์ž‘๋ฌธ ๋ฐ์ดํ„ฐ์…‹์˜ ๋ฐ์ดํ„ฐ ๋ถ„์„ ๊ด€์ ์— ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ํ•™์ˆ  ๊ธ€์“ฐ๊ธฐ process ์ „์ฒด๋ฅผ ๋ฒค์น˜๋งˆํฌ๋กœ ๊ตฌ์ถ•ํ•œ 703๋ฒˆ ๋…ผ๋ฌธ์€ citation generation ๋“ฑ 702๋ฒˆ์˜ ํ†ตํ•ฉ ํ•™์ˆ writing ์‹œ์Šคํ…œ ๊ฐœ๋ฐœ์— ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๊ณผํ•™์  ์ž๋™ํ™”์—์„œ LLM๊ณผ ์ธ๊ฐ„์˜ ์—ญํ•  ์ง„ํ™”์™€ ํ•œ๊ณ„๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜์—ฌ, ๋…ผ๋ฌธ ์ž‘์„ฑ ๋ฐ์ดํ„ฐ์…‹ ์‹ค์ฆ ์—ฐ๊ตฌ์˜ ๋งฅ๋ฝ์„ ์ œ๊ณตํ•ด์ค๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
AI ๊ธฐ๋ฐ˜ ๊ณผํ•™ ์•„์ด๋””์–ด ๋ฐ ์ €์ˆ ์˜ ์ฐฝ์˜์„ฑ๊ณผ ๊ตฌ์กฐํ™” ์ž๋™ํ™”์— ์ง‘์ค‘ํ•˜์—ฌ, ์ธ๊ฐ„ ์ €์ˆ  ๊ณผ์ •/LLM ์ง€์› ์—ฐ๊ตฌ์™€ ํ•จ๊ป˜ ์ฐฝ์˜์„ฑ์˜ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Scholawrite ๋…ผ๋ฌธ์€ ์‚ฌ๋žŒ-LLM ๊ตฌ๋ถ„ ๋ฐ ์ž‘๋ฌธ ํ–‰ํƒœ ๋น„๊ต์—์„œ, ์ธ๊ฐ„๊ณผ LLM์˜ ํ‘œํ˜„์ƒ ์œ ์‚ฌ์„ฑ ๋ฐ ๊ตฌ๋ณ„ ํŠน์„ฑ์„ ์‹ฌ์ธต ๋ถ„์„ํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
The AI writing on the wall ๋…ผ๋ฌธ์€ AI ์ €์ˆ  ๋„์ž…์ด ์‹ค์ œ ํ•™์ˆ  ๋…ผ๋ฌธ ์ž‘์„ฑ ๋ฐ ํ•™๊ณ„ ์ƒํƒœ๊ณ„์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์‹ค์ฆ์ ์œผ๋กœ ๋ถ„์„ํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Human-LLM Coevolution์€ ์‹ค์ œ ์ง‘ํ•„ ๊ณผ์ • ๋‚ด ์ธ๊ฐ„-LM ์ƒํ˜ธ์ž‘์šฉ๊ณผ ๊ทธ ๋ณ€ํ™” ์–‘์ƒ์„ ๋™์ ์œผ๋กœ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Scholawrite๋Š” LLM ํ™œ์šฉ ์ž‘๋ฌธ ํŒจํ„ด์„ ์‹ค์ œ ํ‚ค์ŠคํŠธ๋กœํฌ ๋ฐ์ดํ„ฐ๋กœ ์ถ”์ ํ•˜๋ฉฐ, ๋…ผ๋ฌธ 280๋ฒˆ์˜ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๋ถ„์„์„ ๋ณด์™„ํ•ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
Scholawrite๋Š” ์‹ค์ œ ํ•™์ˆ  ๊ธ€์“ฐ๊ธฐ ํ”„๋กœ์„ธ์Šค์˜ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•์— ์ดˆ์ ์„ ๋งž์ถ”์–ด, ์‹ฌ์ธต ๋…ผ๋ฌธ ์ดํ•ด ๋ฐ ์งˆ์˜์‘๋‹ต ๋ชจ๋ธ์˜ ์‹ค์ œ ์ ์šฉ์„ ๋ณด์—ฌ์ค€๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
Does writing with language models reduce content diversity? ๋…ผ๋ฌธ์€ AI์˜ ๊ธ€์“ฐ๊ธฐ ๋„๊ตฌ๊ฐ€ ์‹ค์ œ ์ฐฝ์˜๋ ฅ๊ณผ ๋‹ค๋ณ€์„ฑ์— ๋ฏธ์น˜๋Š” ๋ถ€์ •์  ์˜ํ–ฅ์„ ์ •๋Ÿ‰์ ์œผ๋กœ ์—ฐ๊ตฌํ•˜๋ฏ€๋กœ, Scholawrite์˜ ์‹ค์ฆ์  ๋ถ„์„๊ณผ ๋Œ€์กฐ์  ๋…ผ์ ์„ ๊ฐ–์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •