What makes medical claims (un) verifiable? analyzing entity and relation properties for fact verification

์ €์ž: Amelie Wรผhrl, Yarik Menchaca Resendiz, Lara Grimminger, Roman Klinger | ๋‚ ์งœ: 2024 | DOI: 10.48550/arXiv.2402.01360 📄 PDF


Essence

์ƒ์˜ํ•™ ์ฃผ์žฅ(biomedical claims)์˜ ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ์„ ๊ฒฐ์ •ํ•˜๋Š” ์š”์ธ์„ ๋ถ„์„ํ•˜๊ธฐ ์œ„ํ•ด, ์—”ํ‹ฐํ‹ฐ(entity)์™€ ๊ด€๊ณ„(relation) ์†์„ฑ์— ์ค‘์ ์„ ๋‘๊ณ  447๊ฐœ์˜ ๊ฒ€์ฆ ๋ถˆ๊ฐ€๋Šฅํ•œ ์‚ฌ๋ก€๋ฅผ ํฌํ•จํ•œ BEAR-FACT ์ฝ”ํผ์Šค๋ฅผ ๊ตฌ์ถ•ํ•œ ์—ฐ๊ตฌ์ด๋‹ค.

Motivation

Achievement

Figure 1: Pairwise co-occurrence of verdicts in BEAR-FACT tweets with more than one claim

๋‹ค์ค‘ ์ฃผ์žฅ์„ ํฌํ•จํ•˜๋Š” ํŠธ์œ—์—์„œ ๊ฒ€์ฆ ๊ฒฐ๊ณผ์˜ ์Œ๋ณ„ ๊ณต์กด ๊ด€๊ณ„

  1. BEAR-FACT ์ฝ”ํผ์Šค ๊ตฌ์ถ•: 1,448๊ฐœ์˜ ์‚ฌ์‹ค ๊ฒ€์ฆ๋œ ์ƒ์˜ํ•™ ์ฃผ์žฅ, ์ฆ๊ฑฐ ๋ฌธ์„œ, ๊ตฌ์กฐํ™”๋œ ์—”ํ‹ฐํ‹ฐ/๊ด€๊ณ„ ์ •๋ณด๋ฅผ ํฌํ•จํ•˜๋Š” ์ฒซ ๋ฒˆ์งธ ํŠธ์œ„ํ„ฐ ๋ฐ์ดํ„ฐ์…‹ ์ œ์‹œ (30.9%๊ฐ€ ๊ฒ€์ฆ ๋ถˆ๊ฐ€๋Šฅ)
  2. ๋ถ€์ • ๊ด€๊ณ„์˜ ๊ฒ€์ฆ ์–ด๋ ค์›€: ๊ธ์ • ๊ด€๊ณ„(์˜ˆ: cause-of)๋ฅผ ํฌํ•จํ•œ ์ฃผ์žฅ์ด ๋ถ€์ • ๊ด€๊ณ„(not-cause-of)๋ณด๋‹ค ๋” ์‰ฝ๊ฒŒ ๊ฒ€์ฆ๋˜๋ฉฐ, ๋” ๋†’์€ ๋น„์œจ๋กœ SUPPORTED ํŒ์ •์„ ๋ฐ›์Œ์„ ๋ฐœ๊ฒฌ
  3. ์ฃผ์„์ž ํ–‰๋™ ํŒจํ„ด: ์‚ฌ์šฉ์ž๋“ค์ด ์ฃผ๋กœ ์—”ํ‹ฐํ‹ฐ๋ฅผ ํ‘œ์ค€๋ช…์œผ๋กœ ์ •๊ทœํ™”ํ•˜๊ณ  ๊ฒ€์ƒ‰ ์ฟผ๋ฆฌ์— ์ œ์•ฝ์กฐ๊ฑด์„ ์ถ”๊ฐ€ํ•˜๋Š” ๋ฐฉ์‹์œผ๋กœ ๊ฒ€์ƒ‰์„ ๊ฐœ์„ ํ•จ์„ ๊ด€์ฐฐ
  4. ๋„๋ฉ”์ธ ์ „๋ฌธ์„ฑ์˜ ์˜ํ–ฅ ์ œํ•œ: ์˜๋ฃŒ ์ „๋ฌธ๊ฐ€์™€ ์ผ๋ฐ˜์ธ ๊ฐ„ ์ฃผ์„ ์‹ ๋ขฐ๋„์— ์œ ์˜๋ฏธํ•œ ์ฐจ์ด๊ฐ€ ์—†์Œ์„ ํ™•์ธ
  5. ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ ์˜ˆ์ธก: RoBERTa ๋ชจ๋ธ์„ ๋ฏธ์„ธ์กฐ์ •ํ•˜์—ฌ ๊ฒ€์ฆ ๊ฐ€๋Šฅํ•œ ์ฃผ์žฅ ์˜ˆ์ธก์€ .82 F1๋กœ ๋†’์€ ์„ฑ๋Šฅ์„ ๋ณด์˜€์œผ๋‚˜, ๊ฒ€์ฆ ๋ถˆ๊ฐ€๋Šฅํ•œ ์ฃผ์žฅ ํƒ์ง€๋Š” .27 F1๋กœ ์ €์กฐํ•จ

How

Figure 2: Verdict distribution across claim relation and entity types

์ฃผ์žฅ์˜ ๊ด€๊ณ„ ๋ฐ ์—”ํ‹ฐํ‹ฐ ์œ ํ˜•์— ๋”ฐ๋ฅธ ๊ฒ€์ฆ ๊ฒฐ๊ณผ ๋ถ„ํฌ

Originality

Limitation & Further Study

Evaluation

Novelty: 4.5/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ์ƒ์˜ํ•™ ์‚ฌ์‹ค ๊ฒ€์ฆ์˜ ๊ฒ€์ฆ ๋ถˆ๊ฐ€๋Šฅ์„ฑ ๋ฌธ์ œ์— ์ดˆ์ ์„ ๋งž์ถ”์–ด ์ฒด๊ณ„์ ์ธ ๋ถ„์„๊ณผ ์ƒˆ๋กœ์šด ์ฝ”ํผ์Šค๋ฅผ ์ œ๊ณตํ•œ ์˜๋ฏธ ์žˆ๋Š” ์—ฐ๊ตฌ์ด๋‚˜, ๊ฒ€์ฆ ๋ถˆ๊ฐ€๋Šฅ ์ฃผ์žฅ ์˜ˆ์ธก์˜ ๋‚ฎ์€ ์„ฑ๋Šฅ๊ณผ ์‹œ๊ฐ„ ์ œ์•ฝ์˜ ํŽธํ–ฅ ๋ฌธ์ œ๋Š” ์‹ค์ œ ์‘์šฉ ์ธก๋ฉด์—์„œ์˜ ํ•œ๊ณ„๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
708๋ฒˆ ๋…ผ๋ฌธ์€ ์‹œ๊ฐ ์ •๋ณด์™€ ํ…์ŠคํŠธ ์ฆ๊ฑฐ๋ฅผ ๊ฒฐํ•ฉํ•ด ๊ณผํ•™ ์ฃผ์žฅ ๊ฒ€์ฆ์˜ ์ฐจ๋ณ„ํ™”๋œ ๋ฐ์ดํ„ฐ ํ‘œํ˜„์„ ์ œ์‹œํ•˜๋ฉฐ, ์˜๋ฃŒ ์ฃผ์žฅ ๊ฒ€์ฆ์˜ ๊ตฌํ˜„์  ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
880์€ ๋ฉ”๋””์ปฌ ์ฃผ์žฅ ๊ฒ€์ฆ ๋ถˆ๊ฐ€ ์›์ธ์„ ๋ถ„์„ํ•œ ์—ฐ๊ตฌ์—ฌ์„œ, 685์—์„œ ์‚ฌ์‹ค ์ถ”์ถœ ๊ธฐ๋ฐ˜ ๊ฒ€์ฆ ๋ฐฉ๋ฒ•์˜ ํ•„์š”์„ฑ๊ณผ ํƒ€๋‹น์„ฑ์„ ๊ฒ€์ฆํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
FactKG(333)์€ ๊ณผํ•™์  ์ฃผ์žฅ ๊ฒ€์ฆ์—์„œ ์—”ํ‹ฐํ‹ฐ-๊ด€๊ณ„ ์ •๋ณด ํ™œ์šฉ์ด๋ผ๋Š” ์œ ์‚ฌ ๋ฌธ์ œ์—์„œ ์ง€์‹ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์„ ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Claimver ๋…ผ๋ฌธ์€ ์„ค๋ช… ๊ฐ€๋Šฅํ•œ ์ฆ๊ฑฐ ํƒ์ง€์™€ ์ฃผ์žฅ ๊ฒ€์ฆ ์ž๋™ํ™” ๋ฐฉ์‹์„ ๋‹ค๋ฃจ๊ธฐ ๋•Œ๋ฌธ์—, BEAR-FACT ์ฝ”ํผ์Šค ๊ธฐ๋ฐ˜ ๋ถ„์„๊ณผ ์ƒํ˜ธ ๋ณด์™„์ ์ž…๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์˜ํ•™ ๋ถ„์•ผ ์—ฐ๊ตฌ ๋ณด๊ณ ์„œ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ ๋ฌธ์ œ๋ฅผ ๋ถ„์„ํ•˜์—ฌ, LLM ๊ธฐ๋ฐ˜ ํ”ผ์–ด ๋ฆฌ๋ทฐ์˜ ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ ๋…ผ์˜์™€ ์—ฐ๊ฒฐ๋œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Explainable biomedical claim verification with large language models ๋…ผ๋ฌธ์€ ์˜์ƒ๋ช… ์ฃผ์žฅ ๊ฒ€์ฆ ๋ฌธ์ œ์—์„œ LLM ๊ธฐ๋ฐ˜ ์„ค๋ช… ๊ฐ€๋Šฅ์„ฑ์— ์ดˆ์ ์„ ๋งž์ถฐ, 880์˜ ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ ์š”์ธ๋ถ„์„๊ณผ ์ƒํ˜ธ๋ณด์™„์  ์‹œ๊ฐ์„ ์ œ๊ณตํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
332๋ฒˆ ๋…ผ๋ฌธ์€ AI ๊ธฐ๋ฐ˜ ๊ณผํ•™์  ์‚ฌ์‹ค ๊ฒ€์ฆ์—์„œ ๋ณต์žกํ•œ ์—”ํ‹ฐํ‹ฐ/๊ด€๊ณ„/์ฆ๊ฑฐ ๊ธฐ๋ฐ˜ ์ถ”๋ก ์„ ๋‹ค๋ฃจ๋ฉฐ ์‘์šฉ ๋ถ„์„์˜ ๋ฒ”์œ„์™€ ๋‚œ์ด๋„๋ฅผ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
685๋Š” ์ฆ๊ฑฐ ๊ธฐ๋ฐ˜ ์‚ฌ์‹ค ์ถ”์ถœ์„ ํ†ตํ•œ ์ฃผ์žฅ ๊ฒ€์ฆ ๊ฒฌ๊ณ ์„ฑ ๊ฐ•ํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ, 880์˜ ๊ฒ€์ฆ ๊ฐ€๋Šฅ์„ฑ ๋ถ„์„ ๊ฒฐ๊ณผ๋ฅผ ํ™œ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Sciclaimhunt๋Š” ์ฆ๊ฑฐ ๊ธฐ๋ฐ˜ ๊ณผํ•™ ์ฃผ์žฅ ๊ฒ€์ฆ์— ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ๋Š” ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹๊ณผ ํƒœ์Šคํฌ๋ฅผ ์ œ๊ณตํ•ด, 880์—์„œ ๋ถ„์„ํ•œ ๊ฒ€์ฆ ๋ถˆ๊ฐ€๋Šฅ ์ฃผ์žฅ ์ผ€์ด์Šค์˜ ๊ณ„๋Ÿ‰์  ์—ฐ๊ตฌ๋ฅผ ํ™•์žฅ ์ ์šฉํ•  ์ˆ˜ ์žˆ๊ฒŒ ํ•ด์ค€๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
SPOT(881)์€ ๋…ผ๋ฌธ ์˜ค๋ฅ˜ ์ž๋™ ๊ฒ€์ฆ์˜ ์‹ค์ œ ๋ฒค์น˜๋งˆํฌ๋กœ, claim ๊ฒ€์ฆ๊ฐ€๋Šฅ์„ฑ ๋ถ„์„ ์—ฐ๊ตฌ๋ฅผ ์‹ค์ „ ์ ์šฉ์œผ๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
What makes medical claims (un) verifiable? ๋…ผ๋ฌธ์€ ChatGPT ๋“ฑ ์ƒ์„ฑํ˜• AI๊ฐ€ ๊ณผํ•™์  ์ฃผ์žฅ ๊ฒ€์ฆ์— ๋ฏธ์น˜๋Š” ์‹ค์ œ ์˜ํ–ฅ๊ณผ ์œ„ํ—˜์„ 899์˜ ๊ณผํ•™๊ณ„ ์˜ํ–ฅ ์‚ฌ๋ก€๋กœ ๊ตฌ์ฒดํ™”ํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
What makes medical claims (un)verifiable?๋Š” ์˜๋ฃŒ ํŒฉํŠธ์ฒดํฌ ํ™˜๊ฒฝ์—์„œ ๊ฒ€์ฆ/๋น„๊ฒ€์ฆ ์ฃผ์žฅ ํŠน์„ฑ ๋ถ„์„์„ ํ†ตํ•ด, 328์ด ์ œ์•ˆํ•œ ์‹œ์Šคํ…œ ํ‰๊ฐ€์™€ ํ˜„์‹ค์  ์–ด๋ ค์›€์„ ๋ณด์™„์ ์œผ๋กœ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
880์€ ์˜๋ฃŒ ํด๋ ˆ์ž„ ๊ฒ€์ฆ์˜ ํ•œ๊ณ„ ์š”์ธ์„ ์ง‘์ค‘ ๋ถ„์„ํ•˜์—ฌ, 116์—์„œ ๋…ผ์˜ํ•˜๋Š” '์ฑ…์ž„ ์žˆ๋Š” AI ํ†ตํ•ฉ'์˜ ํ•„์š”์„ฑ๊ณผ ์œ„ํ—˜์„ฑ์„ ๋’ท๋ฐ›์นจํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •