Cchall: A novel benchmark for joint cross-lingual and cross-modal hallucinations detection in large language models

์ €์ž: Yongheng Zhang, Xu Liu, Ruoxi Zhou, Qiguang Chen, Hao Fei | ๋‚ ์งœ: 2025 | DOI: 10.48550/arXiv.2505.19108 📄 PDF


Essence

Figure 2

Figure 2: (a) Fine-grained performance analysis of MLLMs F1-score for different hallucination types in CCHall.

๋ณธ ๋…ผ๋ฌธ์€ Large Language Model(LLM)์˜ cross-lingual๊ณผ cross-modal ํ™˜๊ฒฝ์—์„œ์˜ hallucination์„ ๋™์‹œ์— ๊ฒ€์ถœํ•˜๋Š” ์ƒˆ๋กœ์šด ๋ฒค์น˜๋งˆํฌ์ธ CCHall์„ ์ œ์‹œํ•œ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ๊ฐ€ cross-lingual ๋˜๋Š” cross-modal ์‹œ๋‚˜๋ฆฌ์˜ค๋ฅผ ๊ฐœ๋ณ„์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ๋ฐ˜๋ฉด, ์ด ๋…ผ๋ฌธ์€ ๋‘ ์‹œ๋‚˜๋ฆฌ์˜ค๊ฐ€ ๊ฒฐํ•ฉ๋œ joint cross-lingual and cross-modal hallucination ๊ฒ€์ถœ ๋ฌธ์ œ์˜ ์ค‘์š”์„ฑ์„ ๊ฐ•์กฐํ•˜๊ณ  ์ด๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•œ ์ฒด๊ณ„์ ์ธ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: (a) Fine-grained performance analysis of MLLMs F1-score for different hallucination types in CCHall.

How

Figure 3

Figure 3: The construction process of CCHall includes: (a) Raw Multi-modal Dataset Selection (ยง3.1), (b) Cross-

Originality

Limitation & Further Study

ํ›„์† ์—ฐ๊ตฌ ๋ฐฉํ–ฅ:

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ํ˜„์žฌ๊นŒ์ง€ ๋ฏธํกํ–ˆ๋˜ joint cross-lingual and cross-modal hallucination ๊ฒ€์ถœ ๋ฌธ์ œ๋ฅผ ์ฒ˜์Œ์œผ๋กœ ์ฒด๊ณ„ํ™”ํ•˜๊ณ , ์ด๋ฅผ ํ‰๊ฐ€ํ•  ์ˆ˜ ์žˆ๋Š” ํฌ๊ด„์  ๋ฒค์น˜๋งˆํฌ CCHall์„ ์ œ์‹œํ•œ๋‹ค. ๊ธฐ์กด ์—ฐ๊ตฌ์˜ ๋ถ„์‚ฐ๋œ ์ ‘๊ทผ๊ณผ ๋‹ฌ๋ฆฌ ์‹ค์ œ ์‘์šฉ ํ™˜๊ฒฝ์˜ ๋ณตํ•ฉ hallucination ๋ฌธ์ œ๋ฅผ ํ†ตํ•ฉ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ์ ์—์„œ ๋†’์€ ๊ฐ€์น˜๋ฅผ ์ง€๋‹ˆ๋ฉฐ, ๊ด‘๋ฒ”์œ„ํ•œ ๋ชจ๋ธ ํ‰๊ฐ€๋ฅผ ํ†ตํ•ด ํ˜„ LLM์˜ ์‹ฌ๊ฐํ•œ ํ•œ๊ณ„๋ฅผ ์‹ค์ฆํ•œ๋‹ค. ๋‹ค๋งŒ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์„ฑ์˜ ๊ตฌ์ฒด์  ์ •๋ณด์™€ ์–ธ์–ด ๋‹ค์–‘์„ฑ์— ๋Œ€ํ•œ ์„ค๋ช…์ด ๋ณด๊ฐ•๋˜๋ฉด ๋”์šฑ ๊ฐ•ํ™”๋  ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ LLM์˜ ํ™˜๊ฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM์˜ ๊ต์ฐจ ์–ธ์–ด ๋Šฅ๋ ฅ ๋ฐ ํ™˜๊ฐ ํ˜„์ƒ์„ ํ‰๊ฐ€ํ•˜๋Š” ์œ ์‚ฌํ•œ ์ ‘๊ทผ ๋ฐฉ์‹์„ ์ทจํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Pelican ๋…ผ๋ฌธ์€ ๋น„์ „-LLM์˜ ํ™˜๊ฐ(hallucination) ๋ฌธ์ œ ์™„ํ™”๋ฅผ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, CCHall ๋…ผ๋ฌธ์˜ ๊ต์ฐจ-๋ชจ๋‹ฌ ํ™˜๊ฐ ํ‰๊ฐ€์™€ ๋Œ€์กฐ์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ์ ‘๊ทผ์„ ์ทจํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ต์ฐจ๋ชจ๋‹ฌ/๊ต์ฐจ์–ธ์–ด ํ™˜๊ฒฝ์—์„œ LLM์˜ ํ™˜๊ฐ ํ‰๊ฐ€(192)์™€ ์ฐจํŠธ ๋“ฑ ๋ณตํ•ฉ์  ์ž…๋ ฅ ๊ธฐ๋ฐ˜ MLLM์˜ ์‹ ๋ขฐ์„ฑ ํ‰๊ฐ€(204)๋Š” ์ƒํ˜ธ ์ฐธ์กฐ๊ฐ€ ํ•„์š”ํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‹ค์ค‘ ์–ธ์–ด ๋ฐ ๋‹ค์ค‘ ๋ชจ๋‹ฌ LLM์˜ ํ™˜๊ฐ ๋ฌธ์ œ๋ฅผ ํ‰๊ฐ€ํ•˜๋Š” ์œ ์‚ฌํ•œ ๋ฒค์น˜๋งˆํฌ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‹ค๊ตญ์–ด LLM์˜ ํ™˜๊ฐ ํƒ์ง€ ๋ฐ ํ‰๊ฐ€๋ฅผ ์œ„ํ•œ ๋‹ค๋ฅธ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์‹œํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‹ค๊ตญ์–ด ๋ฐ ํฌ๋กœ์Šค ๋„๋ฉ”์ธ ์ƒํ™ฉ์—์„œ AI ์—์ด์ „ํŠธ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ•˜์—ฌ, X-WebAgentBench์™€ ๋น„๊ต ์—ฐ๊ตฌ๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Shared imagination ๋…ผ๋ฌธ์€ LLM ํ™˜๊ฐ ํ˜„์ƒ์˜ ์ƒํ˜ธ ์œ ์‚ฌ์„ฑ์„ ๋‹ค๋ฃจ๋ฉฐ, ๊ต์ฐจ-์–ธ์–ด/๋ชจ๋‹ฌ ํ™˜๊ฐ ํ‰๊ฐ€(CCHall)์˜ ๊ฐœ๋…์„ ์‹ค์ œ ๋‹ค์ˆ˜ ๋ชจ๋ธ ๋น„๊ต๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
CiteCheck ๋…ผ๋ฌธ์€ ์ธ์šฉ ํ™˜๊ฐ ๊ฒ€์ถœ์„ ์œ„ํ•œ ๋ฒค์น˜๋งˆํฌ์™€ ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค๋ฃจ๋ฉฐ, ๊ต์ฐจ-์–ธ์–ด ๋ฐ ๊ต์ฐจ-๋ชจ๋‹ฌ ํ‰๊ฐ€ ๋ฌธ์ œ์™€๋„ ๋ฐ€์ ‘ํ•œ ๊ด€๋ จ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •