SciEvo: A 2 Million, 30-Year Cross-disciplinary Dataset for Temporal Scientometric Analysis

์ €์ž: Yiqiao Jin, Yijia Xiao, Yiyang Wang, Jindong Wang | ๋‚ ์งœ: 2024 | DOI: 10.48550/ARXIV.2410.09510 📄 PDF


Essence

Figure 1

Figure 1: Keyword trajectories reflect critical paradigm shifts in AI and epidemiology research over

๋ณธ ๋…ผ๋ฌธ์€ arXiv์—์„œ ์ˆ˜์ง‘ํ•œ 200๋งŒ ๊ฐœ ์ด์ƒ์˜ ํ•™์ˆ  ๋…ผ๋ฌธ๊ณผ ์ธ์šฉ ๊ทธ๋ž˜ํ”„๋ฅผ ํฌํ•จํ•˜๋Š” 30๋…„ ๊ทœ๋ชจ์˜ ์ข…๋‹จ scientometrics ๋ฐ์ดํ„ฐ์…‹ Scito2M์„ ์†Œ๊ฐœํ•œ๋‹ค. ์ด๋ฅผ ํ†ตํ•ด ํ•™์ˆ  ์šฉ์–ด ์ง„ํ™”, ์ธ์šฉ ํŒจํ„ด, ํ•™์ œ ๊ฐ„ ์ง€์‹ ๊ต๋ฅ˜์˜ ์‹œ๊ฐ„์  ๋ณ€ํ™”๋ฅผ ๋ถ„์„ํ•˜์—ฌ ๊ณผํ•™์  ์ง€์‹์˜ ์ฐฝ์ถœ ๋ฐ ํ™•์‚ฐ ๋ฐฉ์‹์„ ์ดํ•ดํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Evolution in the ranks of math and machine-learning terms among all keywords over time.

๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•: 34๋…„ ๋ฒ”์œ„์˜ 156๊ฐœ ์นดํ…Œ๊ณ ๋ฆฌ์— ์†ํ•˜๋Š” 210๋งŒ ๊ฐœ ๋…ผ๋ฌธ, ์ƒ์„ธํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋ฐ ์ธ์šฉ ๊ทธ๋ž˜ํ”„๋ฅผ ํฌํ•จํ•˜๋Š” ์ข…ํ•ฉ scientometrics ๋ฐ์ดํ„ฐ์…‹ Scito2M ์ œ๊ณต. ํŒจ๋Ÿฌ๋‹ค์ž„ ์‹œํ”„ํŠธ ๋ฐœ๊ฒฌ: Machine learning ๊ด€๋ จ ์šฉ์–ด๊ฐ€ 2010๋…„ ์ด์ „ ์—ฐํ‰๊ท  0.31๊ฐœ์—์„œ 2015๋…„ ์ดํ›„ 9.5๊ฐœ๋กœ ๊ธ‰์ฆ. ์ธ์šฉ ํŒจํ„ด ๋ถ„์„: ํ•™์ œ ๊ฐ„ ์ธ์šฉ์ด ์ „์ฒด์˜ 9% ๋ฏธ๋งŒ์ธ ๋ฐ˜๋ฉด ํ•™๊ณผ ๋‚ด ์ธ์šฉ์ด 91% ์ด์ƒ์œผ๋กœ ๊ฐ•ํ•œ ๋™ํ˜•์„ฑ(homophily) ํ™•์ธ. ๋ถ„์•ผ๋ณ„ ์ธ์šฉ ์ฐจ์ด: LLM ์—ฐ๊ตฌ์˜ ์ธ์šฉ ๋‚˜์ด ์ค‘์•™๊ฐ’ 2.48๋…„ ๋Œ€ ๊ตฌ์ˆ ์‚ฌ 9.71๋…„์œผ๋กœ ์‘์šฉ ๋ถ„์•ผ์˜ citation amnesia ํ˜„์ƒ ์ž…์ฆ. ๋ถ„์„ ๋„๊ตฌ ์ œ๊ณต: ์‹œ๊ฐํ™” ๋ฐ ๋ถ„์„ ๋„๊ตฌ๋ฅผ GitHub, Kaggle, HuggingFace์—์„œ ์ œ๊ณต.

How

Figure 2

Figure 2: Evolution in the ranks of math and machine-learning terms among all keywords over time.

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ๊ณผํ•™์  ์ง€์‹์˜ ์ง„ํ™”๋ฅผ ์ข…๋‹จ์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•œ ํฌ๊ด„์ ์ด๊ณ  ์ ‘๊ทผ์„ฑ ๋†’์€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๋ถ„์„ ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ scientometrics ์—ฐ๊ตฌ์— ์ƒ๋‹นํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค. 30๋…„์— ๊ฑธ์นœ ์ž๋ฃŒ๋ฅผ ํ†ตํ•ด ํŒจ๋Ÿฌ๋‹ค์ž„ ์‹œํ”„ํŠธ, ์šฉ์–ด ์ง„ํ™”, ๋ถ„์•ผ๋ณ„ ์ธ์šฉ ๋ฌธํ™” ์ฐจ์ด ๋“ฑ ์ƒˆ๋กœ์šด ํ†ต์ฐฐ์„ ์ œ์‹œํ•˜๋ฉฐ, GitHub ๋“ฑ์„ ํ†ตํ•œ ๊ณต๊ฐœ๋กœ ์žฌํ˜„์„ฑ๊ณผ ํ™œ์šฉ์„ฑ์ด ๋†’๋‹ค. ๋‹ค๋งŒ arXiv ๋ฐ์ดํ„ฐ๋งŒ ํ™œ์šฉํ–ˆ๋‹ค๋Š” ๋ฒ”์œ„ ์ œํ•œ๊ณผ ํ‚ค์›Œ๋“œ ์ถ”์ถœ ์ •ํ™•๋„ ๊ฒ€์ฆ ๋ถ€์žฌ๊ฐ€ ๋ณด์™„๋˜์–ด์•ผ ํ•œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ํ˜„๋Œ€ ๊ณผํ•™์˜ ์„ฑ์žฅ๋ฅ  ๋ถ„์„์ด 30๋…„๊ฐ„ ํ•™๋ฌธ ๋ถ„์•ผ ์ง„ํ™”๋ฅผ ์ถ”์ ํ•˜๋Š” SciEvo ์—ฐ๊ตฌ์˜ ์ด๋ก ์  ๋ฐฐ๊ฒฝ์„ ์ œ๊ณตํ•˜๊ธฐ ๋•Œ๋ฌธ
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
S2ORC ์˜คํ”ˆ ์—ฐ๊ตฌ ์ฝ”ํผ์Šค๊ฐ€ ๋Œ€๊ทœ๋ชจ scientometric ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•์˜ ๋ฐฉ๋ฒ•๋ก ์  ์„ ๋ก€๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ปค๋ฎค๋‹ˆํ‹ฐ ํƒ์ง€ ์•Œ๊ณ ๋ฆฌ์ฆ˜๊ณผ ๋™์  ๋„คํŠธ์›Œํฌ ๋ถ„์„์˜ ๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ์ดˆ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์—ฐ๊ตฌ ํŠธ๋ Œ๋“œ ์˜ˆ์ธก์—์„œ ์˜๋ฏธ๋ก ์  ์‹ ๊ฒฝ๋ง ์ ‘๊ทผ๋ฒ•๊ณผ 30๋…„ ์ข…๋‹จ ๋ฐ์ดํ„ฐ์…‹์„ ํ™œ์šฉํ•œ ๊ณผํ•™ ์ง„ํ™” ๋ถ„์„์ด ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•˜๊ธฐ ๋•Œ๋ฌธ
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ณผํ•™ ์—ฐ๊ตฌ์˜ ์‚ฌํšŒ์  ์˜ํ–ฅ๊ณผ ๊ณต๊ณต ํ™œ์šฉ ํŒจํ„ด์„ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ ๋ถ„์„ํ•˜๋Š” ์œ ์‚ฌํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ทจํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ณผํ•™ ์—ฐ๊ตฌ ์ƒํƒœ๊ณ„์˜ ๋„์ „๊ณผ ๊ธฐํšŒ๋ฅผ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•๋ก ์œผ๋กœ ๋ถ„์„ํ•œ ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋…ผ๋ฌธ์˜ ์ง€์‹ ๊ณต๊ฐ„ ๋‚ด ์œ„์น˜์™€ ์ธ์šฉ ํ™•์‚ฐ ํŒจํ„ด์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜๋Š” ์œ ์‚ฌํ•œ ๋ฐฉ๋ฒ•๋ก ์„ ์‚ฌ์šฉํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
arXiv ๊ธฐ๋ฐ˜ ํ•™์ œ๊ฐ„ ์ง€์‹ ๊ต๋ฅ˜ ๋ถ„์„์ด AI ๋„์ž…์—์„œ ์œ„ํ—˜ ๋‹ด๋ก ์˜ scientometric ๋ถ„์„๊ณผ ์ƒํ˜ธ ๋ณด์™„์  ๋ฐฉ๋ฒ•๋ก ์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ธ€๋กœ๋ฒŒ ๊ณผํ•™ ํ˜‘๋ ฅ ๋„คํŠธ์›Œํฌ์˜ ๊ตฌ์กฐ์  ํŠน์„ฑ์„ ๋ถ„์„ํ•œ ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์˜๋ฏธ ๋„คํŠธ์›Œํฌ ๋ถ„์„์„ ํ†ตํ•ด ์–ธ์–ด ์—ฐ๊ตฌ์˜ ์ง€์‹ ๊ตฌ์กฐ๋ฅผ ๋งคํ•‘ํ•œ ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Science of Science์˜ ํŠน์ • ์ธก๋ฉด์„ ๋” ๊นŠ์ด ํƒ๊ตฌํ•˜๋Š” ํ™•์žฅ ์—ฐ๊ตฌ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์‹œ๊ฐ„์— ๋”ฐ๋ฅธ ๊ณผํ•™ ๊ตฌ์กฐ ๋ณ€ํ™” ๋งคํ•‘ ์—ฐ๊ตฌ๋ฅผ ์•„์นด์ด๋ธŒ ๋ฐ์ดํ„ฐ๋ฅผ ํ™œ์šฉํ•œ ๋Œ€๊ทœ๋ชจ ์ข…๋‹จ ๋ถ„์„์œผ๋กœ ํ™•์žฅํ–ˆ๊ธฐ ๋•Œ๋ฌธ
์‘์šฉ ์‚ฌ๋ก€
๊ณผํ•™์˜ ๊ณผํ•™ ๋ถ„์•ผ์—์„œ ํ•™์ œ๊ฐ„ ์ถœํ˜„์„ ์‹๋ณ„ํ•˜๋Š” ๊ตฌ์ฒด์  ์‚ฌ๋ก€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
ํ•™์ œ๊ฐ„ ๋…ผ๋ฌธ์˜ ์ง€์› ๋ฉ”์ปค๋‹ˆ์ฆ˜ ๋ถ„์„์ด 30๋…„๊ฐ„ ํ•™์ œ๊ฐ„ ์ง€์‹ ๊ต๋ฅ˜ ํŒจํ„ด ์ดํ•ด์— ์‹ค์ฆ์  ๊ธฐ์ดˆ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •