CiteCheck: Retrieval-Grounded Detection of LLM Citation Hallucinations in Scientific Text

์ €์ž: Khashayar Khajavi, Shaghayegh Sadeghi, Rise Adhikari, Alexander Tessier | ๋‚ ์งœ: 2026-05-26 | URL: https://arxiv.org/abs/2605.27700 📄 PDF


Essence

Figure 1

Figure 1: The CITECHECK pipeline. A raw citation string is first parsed into structured metadata, then matched against

LLM์ด ์ƒ์„ฑํ•œ ๊ณผํ•™ ๋ฌธํ—Œ์˜ ์ธ์šฉ ์˜ค๋ฅ˜๋ฅผ ํƒ์ง€ํ•˜๊ธฐ ์œ„ํ•ด ํ•™์ˆ  ๊ฒ€์ƒ‰, ๊ตฌ์กฐํ™”๋œ LLM ๊ธฐ๋ฐ˜ ๋น„๊ต, ๊ทธ๋ฆฌ๊ณ  ์ž„๊ณ„๊ฐ’ ๊ธฐ๋ฐ˜ ์˜์‚ฌ๊ฒฐ์ •์„ ๊ฒฐํ•ฉํ•œ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ํ”„๋ ˆ์ž„์›Œํฌ CiteCheck๋ฅผ ์ œ์‹œํ•œ๋‹ค. 982๊ฐœ์˜ ์ œ์–ด๋œ ๋ถ€ํŒจ ๋ฌผ๋ฆฌํ•™ ์ธ์šฉ ๋ฒค์น˜๋งˆํฌ์—์„œ 88.7 macro-F1์„ ๋‹ฌ์„ฑํ•˜์—ฌ GPT, Claude, Gemini๋ฅผ ํฌํ•จํ•œ ๋ฒ ์ด์Šค๋ผ์ธ์„ ๋Šฅ๊ฐ€ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: The CITECHECK pipeline. A raw citation string is first parsed into structured metadata, then matched against

์ฃผ์š” ์„ฑ๊ณผ:

  1. ๋ฒค์น˜๋งˆํฌ ๊ตฌ์„ฑ: ๋ฏธ๋ฌ˜ํ•œ ๋ฉ”ํƒ€๋ฐ์ดํ„ฐ ๋“œ๋ฆฌํ”„ํŠธ์™€ ์™„์ „ํ•œ ์‚ฌ๊ธฐ ์ฐธ๊ณ ๋ฌธํ—Œ์„ ํฌํ•จํ•œ ์ œ์–ด๋œ ๋ถ€ํŒจ๋ฅผ ๊ฐ–์ถ˜ 982๊ฐœ ์ธ์šฉ ๋ฌผ๋ฆฌํ•™ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ•
  2. ์ตœ๊ณ  ์„ฑ๋Šฅ: ํ…Œ์ŠคํŠธ ์…‹์—์„œ 88.7% macro-F1, 88.9% ์ •ํ™•๋„๋กœ GPT/Claude/Gemini(์›น ๊ฒ€์ƒ‰, few-shot ํฌํ•จ)๋ฅผ ๋Šฅ๊ฐ€
  3. Zero-shot ํšจ์œจ์„ฑ: ๋ชจ๋ธ์ด zero-shot์ด๋ฉด์„œ๋„ ๊ฐ€์žฅ ๊ฐ•ํ•œ few-shot LLM ๋ฒ ์ด์Šค๋ผ์ธ์„ 5.8 macro-F1 ํฌ์ธํŠธ, 5.7 ์ •ํ™•๋„ ํฌ์ธํŠธ ์ƒํšŒ
  4. ๋ฐฉ๋ฒ•๋ก  ํƒ€๋‹น์„ฑ: ๊ฒ€์ƒ‰๊ณผ ๊ฒ€์ฆ์˜ ๋ถ„๋ฆฌ, ๊ตฌ์กฐํ™”๋œ ๋น„๊ต, ์ž๋™ ์ˆ˜์ •์„ ํฌํ•จํ•œ ํ†ตํ•ฉ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์œ ํšจ์„ฑ ์ž…์ฆ

How

Figure 1

Figure 1: The CITECHECK pipeline. A raw citation string is first parsed into structured metadata, then matched against

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: CiteCheck๋Š” LLM ์ƒ์„ฑ ๊ณผํ•™ ๋ฌธํ—Œ์˜ ์ธ์šฉ ์˜ค๋ฅ˜ ํƒ์ง€์— ๋Œ€ํ•œ ์‹ค์งˆ์ ์ด๊ณ  ์‹œ์˜์ ์ ˆํ•œ ํ•ด๊ฒฐ์ฑ…์„ ์ œ์‹œํ•œ๋‹ค. ํ•™์ˆ  ๊ฒ€์ƒ‰๊ณผ ๊ตฌ์กฐํ™”๋œ LLM ๊ฒ€์ฆ์„ ๊ฒฐํ•ฉํ•œ ๋ฐฉ๋ฒ•๋ก ์€ soundํ•˜๋ฉฐ, ์ œ์–ด๋œ ๋ฒค์น˜๋งˆํฌ์—์„œ ๊ฐ•ํ•œ ์„ฑ๋Šฅ์„ ๋ณด์ธ๋‹ค. ๊ทธ๋Ÿฌ๋‚˜ ๋„๋ฉ”์ธ ์ผ๋ฐ˜ํ™”, ๊ฒ€์ƒ‰ ์ธํ”„๋ผ ์˜์กด์„ฑ, ์ด์ฐจ ๊ฒ€์ฆ ๋…ผ๋ฆฌ์˜ ํˆฌ๋ช…์„ฑ ์ธก๋ฉด์—์„œ ๊ฐœ์„ ์˜ ์—ฌ์ง€๊ฐ€ ์žˆ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
When large language models meet citation: A survey ๋…ผ๋ฌธ์€ LLM๊ณผ ์ธ์šฉ ์ถ”์ฒœ/์ƒ์„ฑ์˜ ๊ด€๊ณ„, ํ•œ๊ณ„ ๋ฐ ์˜ค๋ฅ˜๋ฅผ ์ข…ํ•ฉ์ ์œผ๋กœ ๋‹ค๋ฃจ์–ด CiteCheck์˜ ๋ฌธ์ œ์˜์‹ ์ด๋ก ์  ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Retrieval-Augmented Generation for Large Language Models๋Š” ์ธ์šฉ ์˜ค๋ฅ˜ ํƒ์ง€์˜ ํ•ต์‹ฌ ํ† ๋Œ€์ธ retrieval-augmented generation์˜ ์ตœ์‹  ๋™ํ–ฅ๊ณผ ํ•œ๊ณ„๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
CiteCheck ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ์ธ์šฉ ์˜ค๋ฅ˜ ํƒ์ง€์™€ ์‹ ๋ขฐ๋„ ํ–ฅ์ƒ ์ธก๋ฉด์—์„œ ์ƒํ˜ธ๋ณด์™„์  ๋ฒค์น˜๋งˆํฌ์™€ ์‹ค์ œ์  ๋„์ „๊ณผ์ œ๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‘ ๋…ผ๋ฌธ ๋ชจ๋‘ ์ธ์šฉ ์˜ค๋ฅ˜ ๋ฐ ์ถ”์ฒœ์˜ ์‹ ๋ขฐ์„ฑ์— ์ดˆ์ ์„ ๋งž์ถ”์ง€๋งŒ, CiteCheck๋Š” ์˜ค๋ฅ˜ ํƒ์ง€์—, ILCiteR์€ ํ•ด์„๊ฐ€๋Šฅ์„ฑ์— ์ค‘์ ์„ ๋‘ก๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
HLM-Cite ๋…ผ๋ฌธ์€ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ LM ๊ธฐ๋ฐ˜ ๋…ผ๋ฌธ ์ธ์šฉ ์ƒ์„ฑ ์›Œํฌํ”Œ๋กœ์šฐ๋ฅผ ์ œ์•ˆํ•˜์—ฌ ์ธ์šฉ hallucination ํƒ์ง€์˜ ๋Œ€์•ˆ ์‹œ์Šคํ…œ์„ ๋ณด์—ฌ์ค€๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
CiteCheck ๋…ผ๋ฌธ์€ LLM ์ƒ์„ฑ ๋‚ด์šฉ์˜ ์ธ์šฉ ๋ฐ ํ™˜๊ฐ ๊ฒ€์ฆ ๊ธฐ์ˆ ์„ ๊ณ ๋„ํ™”ํ•˜๋ฉฐ, KG ๊ธฐ๋ฐ˜ ํ™˜๊ฐ ์™„ํ™” ๋ฐฉ๋ฒ•๊ณผ ์‹ค์ œ ์ ์šฉ ์ฐจ์ด๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
CiteCheck ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ์ธ์šฉ ์˜ค๋ฅ˜ ํƒ์ง€ ๋“ฑ ์ถ”์ฒœ๋œ ์ธ์šฉ๋ฌธ ๋ฐ ๋…ผ๋ฌธ ๊ฐ„ ๊ด€๊ณ„์˜ ์‹ ๋ขฐ์„ฑ ๊ฐœ์„ ์„ ์‹ค์ œ์ ์œผ๋กœ ํ™•์žฅํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์ƒ์„ฑ AI ๊ธฐ๋ฐ˜ ๋…ผ๋ฌธ ์ธ์šฉ ์กฐ์ž‘ ํƒ์ง€์™€ ๊ด€๋ จ๋œ ์ตœ์‹  RAG ๊ธฐ๋ฐ˜ ํƒ์ง€ ์‹œ์Šคํ…œ์œผ๋กœ, AI ์ƒ์„ฑ ์ปจํ…์ธ  ๊ฒ€์ฆ ๋ถ„์•ผ ์ตœ์‹  ๋™ํ–ฅ์„ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ์ž์—ฐ์–ด ์ฒ˜๋ฆฌ๋กœ ํ—›์†Œ๋ฆฌ(hallucination)๋ฅผ ์ค„์ด๋ ค๋Š” ์ ‘๊ทผ์€ CiteCheck์˜ ์ธ์šฉ ๋งฅ๋ฝ ์˜ค๋ฅ˜ ํƒ์ง€ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ์œ ์‚ฌํ•œ ๊ฐœ์„  ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
CiteCheck ๋…ผ๋ฌธ์€ ์ธ์šฉ ํ™˜๊ฐ ๊ฒ€์ถœ์„ ์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ์™€ ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค๋ฃจ๋ฉฐ, ๊ต์ฐจ-์–ธ์–ด ๋ฐ ๊ต์ฐจ-๋ชจ๋‹ฌ ํ‰๊ฐ€ ๋ฌธ์ œ์™€๋„ ๋ฐ€์ ‘ํ•œ ๊ด€๋ จ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •