Pelican: Correcting Hallucination in Vision-LLMs via Claim Decomposition and Program of Thought Verification

์ €์ž: Pritish Sahu, Karan Sikka, Ajay Divakaran | ๋‚ ์งœ: 2024 | DOI: 10.48550/ARXIV.2407.02352 📄 PDF


Essence

Figure 1

Pelican์˜ ์ „์ฒด ํŒŒ์ดํ”„๋ผ์ธ: ์‹œ๊ฐ์  ํ‘œ(Visual Table) ๊ตฌ์„ฑ, ์ฒญ๊ตฌ(Claim) ๋ถ„ํ•ด, Program-of-Thought ์ฝ”๋“œ ์ƒ์„ฑ, ํ†ตํ•ฉ ๊ฒ€์ฆ ์ข…ํ•ฉ

์‹œ๊ฐ ์–ธ์–ด ๋ชจ๋ธ(LVLM)์˜ ํ™˜๊ฐ(hallucination) ๋ฌธ์ œ๋ฅผ 1์ฐจ ์ˆ ์–ด(first-order predicates) ๊ธฐ๋ฐ˜ ์ฒญ๊ตฌ ๋ถ„ํ•ด์™€ ํŒŒ์ด์ฌ ์ฝ”๋“œ ์ƒ์„ฑ์„ ํ†ตํ•ด ๊ฒ€์ฆํ•˜๊ณ  ๋ณด์ •ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค.

Motivation

Achievement

  1. ํ™˜๊ฐ ๊ฐ์†Œ ์„ฑ๋Šฅ: MMHal-Bench์—์„œ ๋‹ค์–‘ํ•œ LVLM ๊ธฐ์ค€์„  ๋Œ€๋น„ 8%-32% ํ™˜๊ฐ ๊ฐ์†Œ, ๊ธฐ์กด ํ™˜๊ฐ ์™„ํ™” ๋ฐฉ๋ฒ• ๋Œ€๋น„ 27% ๊ฐ์†Œ ๋‹ฌ์„ฑ
  2. ๊ฐ•ํ™”๋œ ์ ‘์ง€: ์ค‘๊ฐ„ ๋ณ€์ˆ˜($person_riding ๋“ฑ)๋ฅผ ํ†ตํ•œ ์ •ํ™•ํ•œ ๊ฐ์ฒด ์ธ์Šคํ„ด์Šค ์ฐธ์กฐ๋กœ ๋‹ค์ค‘ ๊ฐ์ฒด ์ฒญ๊ตฌ์—์„œ์˜ ์ •๋ฐ€๋„ ํ–ฅ์ƒ
  3. ์ผ๊ด€์„ฑ ๊ฒ€์ฆ: ๋ถ€๋ถ„ ์ฒญ๊ตฌ ๊ฐ„ ๊ณ„์‚ฐ ๊ณต์œ ๋ฅผ ํ†ตํ•ด ๋ถˆ์ผ์น˜(inconsistency) ์‹๋ณ„ ๋ฐ ์ ์‘์  ๋ณด์ • ๊ฐ€๋Šฅ
  4. ๋‹ค์ค‘ ๋ฒค์น˜๋งˆํฌ ๊ฒ€์ฆ: GAVIE, MME ๋ฐ์ดํ„ฐ์…‹์—์„œ๋„ ์ผ๊ด€๋œ ์„ฑ๋Šฅ ๊ฐœ์„  ์ž…์ฆ

How

Figure 1

4๋‹จ๊ณ„ ํŒŒ์ดํ”„๋ผ์ธ: (Step 1) ์‹œ๊ฐ์  ํ‘œ ๊ตฌ์„ฑ, (Step 2) ์ฒญ๊ตฌ ๋ถ„ํ•ด, (Step 3) Program-of-Thought ์ฝ”๋“œ ์ƒ์„ฑ, (Step 4) ํ†ตํ•ฉ ๊ฒ€์ฆ

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4.5/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: Pelican์€ ์‹œ๊ฐ ์–ธ์–ด ๋ชจ๋ธ์˜ ํ™˜๊ฐ ๋ฌธ์ œ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ ‘๊ทผํ•˜๋Š” ๊ฒฌ๊ณ ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, ์ค‘๊ฐ„ ๋ณ€์ˆ˜์™€ ๊ณ„์‚ฐ ๊ณต์œ ๋ผ๋Š” ์‹ค์งˆ์  ๊ฐœ์„ ์„ ํ†ตํ•ด SOTA ๋Œ€๋น„ ์˜๋ฏธ ์žˆ๋Š” ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋‚˜, ๋†’์€ ๊ณ„์‚ฐ ๋น„์šฉ๊ณผ ์‹œ๊ฐ ๋„๊ตฌ ์˜์กด์„ฑ์ด ์‹ค๋ฌด ์ ์šฉ ์‹œ ์ œ์•ฝ์ด ๋  ์ˆ˜ ์žˆ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Multivers๋Š” ํŒฉํŠธ์ฒดํ‚น ์•ฝํ•œ ๊ฐ๋… ๊ธฐ๋ฐ˜ ๋ฐ ์„œ์ˆ  ๋ถ„์„ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ๊ณตํ•ด Pelican์˜ claim decomposition/๊ฒ€์ฆ ๋ฐฉ๋ฒ•์˜ ๊ธฐ์ดˆ์  ๋ฐฐ๊ฒฝ์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
541๋ฒˆ ๋…ผ๋ฌธ์€ ๊ณผํ•™ ์‚ฌ์‹ค ๊ฒ€์ฆ์˜ counter-evidence ๋ฌธ์ œ์ ์„ ์ •๋ฆฌํ•˜์—ฌ, 610๋ฒˆ์˜ claim decomposition ๋ฐ fact-checking ์ „๋žต์˜ ํ•œ๊ณ„์™€ ์‹ค๋ฌด ์ ์šฉ ์‹œ ์‹œ์‚ฌ์ ์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ƒ์„ธ ํŽ˜๋ฅด์†Œ๋‚˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๊ธฐ๋ฒ•์ด LLM ๊ธฐ๋ฐ˜ ์˜์‚ฌ-ํ™˜์ž ์ƒํ˜ธ์ž‘์šฉ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ ์„ค๊ณ„์˜ ํ•ต์‹ฌ์  ๊ธฐ์ดˆ๋กœ ์ž‘์šฉํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
TheoremQA ๋…ผ๋ฌธ์€ ์ •ํ˜• ์ˆ˜ํ•™ ์ฆ๋ช…, ์ •๋ฆฌ ๊ธฐ๋ฐ˜ ๋ฌธ์ œ ํ•ด๊ฒฐ์„ ๋‹ค๋ฃจ๋ฉฐ, Pelican๊ณผ ๊ฐ™์ด ์ž๋™ ์ฆ๋ช…์—์„œ LLM์˜ ์ •ํ™•์„ฑ ๊ฒ€์ฆ์— ๊ด€ํ•œ ์‹ค์ฆ์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ”„๋กœ๊ทธ๋ž˜๋ฐ ๊ธฐ๋ฐ˜ ํŒŒ์ƒ ์ฆ๊ฑฐ ์ถ”์ถœ์— ๊ธฐ๋ฐ˜ํ•œ ๋ณต์žกํ•œ ์‚ฌ์‹ค ๊ฒ€์ฆ ๊ธฐ๋ฒ•์„ ๋‹ค๋ฃจ๋ฉฐ, Pelican์—์„œ ์ฝ”๋“œ ๊ธฐ๋ฐ˜ ์‹œ๊ฐ-์–ธ์–ด ๊ฒ€์ฆ ์ฒด๊ณ„์™€์˜ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
657๋ฒˆ ๋…ผ๋ฌธ์€ ์ฐจํŠธ ๊ธฐ๋ฐ˜ ์ž๋™ ํŒฉํŠธ์ฒดํ‚น์„ ๋‹ค๋ฃจ์–ด, 610๋ฒˆ์ด ์‹œ๊ฐ-์–ธ์–ด ํ™˜๊ฐ ๋ฌธ์ œ์—์„œ ์ œ์‹œํ•˜๋Š” ๊ฒ€์ฆยท๋ณด์ • ๋ฐฉ๋ฒ•๊ณผ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
610๋ฒˆ ๋…ผ๋ฌธ์€ VLM์˜ hallucination ์ˆ˜์ • ๊ธฐ๋ฒ•์„ ๋‹ค๋ฃจ๋ฉฐ, 396๋ฒˆ ๋…ผ๋ฌธ์—์„œ ๋‹ค์–‘ํ•œ hallucination ์™„ํ™” ์ „๋žต ๋น„๊ต์— ์ ํ•ฉํ•œ ๋Œ€์•ˆ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Pelican ๋…ผ๋ฌธ์€ ๋น„์ „-LLM์˜ ํ™˜๊ฐ(hallucination) ๋ฌธ์ œ ์™„ํ™”๋ฅผ ์œ„ํ•œ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, CCHall ๋…ผ๋ฌธ์˜ ๊ต์ฐจ-๋ชจ๋‹ฌ ํ™˜๊ฐ ํ‰๊ฐ€์™€ ๋Œ€์กฐ์ ์œผ๋กœ ๋‹ค์–‘ํ•œ ์ ‘๊ทผ์„ ์ทจํ•˜๊ณ  ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์‹ค์ œ ์ƒํ˜ธ์ž‘์šฉ์  ํŽ˜๋ฅด์†Œ๋‚˜ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ํ†ตํ•ด ํ…์ŠคํŠธ ์ƒ์„ฑํ˜• AI์˜ ๊ฒŒ์ž„ยท๊ต์œก์  ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์ž„์ƒ ์˜์—ญ์œผ๋กœ ํ™•์žฅ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
M2F๋Š” ์ˆ˜ํ•™์  ๋ฌธํ—Œ์˜ ์ž๋™ ํ˜•์‹ํ™” ๋ฐ ์ •๋ฆฌ-์ฆ๋ช… ๋Œ€์‘์„ ๋‹ค๋ฃจ์–ด Pelican์˜ ์ฆ๋ช… ์˜ค๋ฅ˜ ์ˆ˜์ • ํ”„๋ ˆ์ž„์›Œํฌ์— ์‹ค์งˆ์  ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ์„ ์ค€๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
์‹ค์ œ ์ž„์ƒํ˜„์žฅ ์˜๋ฃŒ LVLM ์˜ค๋ฅ˜ ๊ต์ •์˜ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ•ด, Pelican์˜ ํ™˜๊ฐ ๊ต์ • ์ ‘๊ทผ๋ฒ•์ด ํŠน์ˆ˜ ๋„๋ฉ”์ธ์—์„œ ์–ด๋–ป๊ฒŒ ์ ์šฉ๋˜๋Š”์ง€ ๋ณด์—ฌ์ค€๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •