ScispaCy: Fast and Robust Models for Biomedical Natural Language Processing

์ €์ž: Mark Neumann, Daniel King, Iz Beltagy, Waleed Ammar | ๋‚ ์งœ: 2019-08 | DOI: 10.18653/v1/W19-5034 📄 PDF


Essence

Figure 2

Figure 2: Unlabeled attachment score (UAS) perfor-

biomedical ๋ฐ scientific ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ๋ฅผ ์œ„ํ•ด spaCy๋ฅผ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•œ scispaCy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ POS tagging, dependency parsing, NER ๋“ฑ์˜ ์ž‘์—…์—์„œ robustํ•˜๊ณ  ๋น ๋ฅธ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Unlabeled attachment score (UAS) perfor-

How

Originality

Limitation & Further Study

Evaluation

Novelty: 3/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: biomedical NLP์˜ ์‹ค์šฉ์  ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ์ž˜ ์„ค๊ณ„๋œ library๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ๊ณต๊ฐœ ์ž์›๊ณผ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ•จ์œผ๋กœ์จ community์— ์˜๋ฏธ ์žˆ๋Š” ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค. ์„ฑ๋Šฅ๊ณผ ์†๋„์˜ ๊ท ํ˜•์„ ์ž˜ ์œ ์ง€ํ•˜๊ณ  ์žˆ์–ด ์‹ค๋ฌด ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์ด ๋†’๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
161์€ ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ์— ํŠนํ™”๋œ BERT ๊ธฐ๋ฐ˜ ๋ชจ๋ธ๋กœ, 734์˜ scispaCy ๋ผ์ด๋ธŒ๋Ÿฌ๋ฆฌ ์„ค๊ณ„์— ํ•ต์‹ฌ์  ์ด๋ก ๊ณผ ๋ฐ์ดํ„ฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
707 ๋…ผ๋ฌธ์€ ๊ณผํ•™ ํ…์ŠคํŠธ๋ฅผ ์œ„ํ•œ ์‚ฌ์ „ํ•™์Šต ์–ธ์–ด๋ชจ๋ธ๋กœ์„œ, 734์˜ ์ƒ์˜ํ•™ ์–ธ์–ด์ฒ˜๋ฆฌ ํŠนํ™” ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋ชจ๋ธ ๊ตฌ์กฐ์ ์œผ๋กœ ๋Œ€์•ˆ์„ฑ์„ ๊ฐ€์ง‘๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ScispaCy๋Š” ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ ๋ถ„์•ผ ๋…ผ๋ฌธ ์š”์•ฝ๊ณผ ์ •๋ณด ์ถ”์ถœ์„ ์œ„ํ•œ ์‹ ์†/๊ฐ•๊ฑด ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜์—ฌ, ๋ถ„๊ณผ๋ณ„ ๊ด€๋ จ ์—ฐ๊ตฌ ์ž๋™ํ™” ๋ฐฉ๋ฒ• ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
734๋ฒˆ ๋…ผ๋ฌธ์€ ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ ํ…์ŠคํŠธ ์ฒ˜๋ฆฌ(ScispaCy)๋ฅผ ์œ„ํ•œ ๋„๊ตฌ๋ฅผ ์ œ๊ณตํ•ด, 522๋ฒˆ์˜ ์žฌ๋ฃŒ ๊ณผํ•™ ์ž์—ฐ์–ด ์ธํ„ฐํŽ˜์ด์Šค์™€ ๋น„๊ต๋˜๋Š” LLM ํ™œ์šฉ ์‚ฌ๋ก€์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์ƒ์˜ํ•™ยท๊ณผํ•™์  ๋งฅ๋ฝ์—์„œ ๋„ค์ž„๋“œ ์—”ํ„ฐํ‹ฐ ์ธ์‹, ๋งฅ๋ฝ ์ดํ•ด ์„ฑ๋Šฅ ํ‰๊ฐ€ ๋ฐ์ดํ„ฐ์…‹์„ ๊ฐœ๋ฐœํ•˜์—ฌ SciCUEval๊ณผ ๋ชฉ์ ์ด ์œ ์‚ฌํ•จ.
๋‹ค๋ฅธ ์ ‘๊ทผ
734๋Š” ๋ฐ”์ด์˜ค๋ฉ”๋””์ปฌ NER ๋ฐ ์—”ํ‹ฐํ‹ฐ ๋งํฌ ํŠนํ™” ๋ชจ๋ธ์„ ์ œ๊ณตํ•˜์—ฌ 3251์˜ CoPaLink ๊ฐœ๋…๊ณผ ์œ ์‚ฌํ•˜๊ฒŒ ์ ์šฉ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
530 ๋…ผ๋ฌธ์€ ์ƒ์˜ํ•™ ๋ฐ ์˜๋ฃŒ QA ์ตœ์ ํ™”๋ฅผ ์œ„ํ•œ ์–ธ์–ด๋ชจ๋ธ ํŒŒ์ธํŠœ๋‹ ๊ฒฝ๋กœ๋ฅผ ๋‹ค๋ค„, 734 ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์‹ค์ œ ํ™œ์šฉ ์˜์—ญ์„ ํ™•์žฅ์‹œํ‚ต๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
699์˜ ๋‹จ์ผ์„ธํฌ ๋ฐ์ดํ„ฐ ๋ถ„์„ ํŒŒ์ดํ”„๋ผ์ธ์—์„œ scispaCy์˜ NER์™€ ํ…์ŠคํŠธ ํŒŒ์‹ฑ ๊ธฐ๋Šฅ์ด ์‹ค์ œ ๋ฐ์ดํ„ฐ ์ฒ˜๋ฆฌ์— ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •