Every part matters: Integrity verification of scientific figures based on multimodal large language models

์ €์ž: Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, Wei Lu | ๋‚ ์งœ: 2024 | DOI: N/A 📄 PDF


Essence

Figure 1

๊ทธ๋ฆผ 1: ์ž์—ฐ ์ด๋ฏธ์ง€์™€ ๊ณผํ•™ ๋…ผ๋ฌธ ๊ทธ๋ฆผ์˜ ํ…์ŠคํŠธ-์ด๋ฏธ์ง€ ์ •๋ ฌ ์ž‘์—… ๋น„๊ต. ๊ณผํ•™ ๊ทธ๋ฆผ์˜ ํ…์ŠคํŠธ-์ •๋ ฌ ์ž‘์—…์€ ๊ฐ ๋ชจ๋“ˆ ์š”์†Œ๋ฅผ ํŒŒ์‹ฑํ•˜๊ณ , ํ…์ŠคํŠธ๋ฅผ ์ •๋ ฌํ•˜๋ฉฐ, ์ •๋ ฌ๋˜์ง€ ์•Š์€ ์š”์†Œ๋ฅผ ์‹๋ณ„ํ•˜๋Š” ๊ฒƒ์„ ์š”๊ตฌํ•จ.

๋ณธ ์—ฐ๊ตฌ๋Š” ๊ณผํ•™ ๋…ผ๋ฌธ์˜ ๊ทธ๋ฆผ์—์„œ ํ…์ŠคํŠธ์™€ ์‹œ๊ฐ ์š”์†Œ์˜ ์„ธ๋ฐ€ํ•œ ์ •๋ ฌ์„ ์œ„ํ•œ ์ƒˆ๋กœ์šด ์ž‘์—…์ธ "Figure Integrity Verification"์„ ์ œ์•ˆํ•˜๋ฉฐ, ์ด๋ฅผ ์ง€์›ํ•˜๊ธฐ ์œ„ํ•ด Figure-seg ๋ฐ์ดํ„ฐ์…‹๊ณผ Every Part Matters (EPM) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ–ˆ๋‹ค. ์ด๋Š” ๋ณต์žกํ•œ ๋„๋ฉ”์ธ-ํŠนํ™” ๊ณผํ•™ ๊ทธ๋ฆผ์˜ ์ดํ•ด์™€ ๊ฒ€์ฆ์„ ํฌ๊ฒŒ ๊ฐœ์„ ํ•œ๋‹ค.

Motivation

Achievement

Figure 2

๊ทธ๋ฆผ 2: ๊ณผํ•™ ๊ทธ๋ฆผ ์„ธ๋ฐ€ํ•œ ์ •๋ ฌ์„ ์œ„ํ•œ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ• ํ”„๋กœ์„ธ์Šค ๊ฐœ์š”.

  1. ํ…์ŠคํŠธ-๊ทธ๋ฆผ ์ •๋ ฌ ์„ฑ๋Šฅ ๋Œ€ํญ ๊ฐœ์„ : CIoU ๋ฉ”ํŠธ๋ฆญ์—์„œ 22.53%, gIoU ๋ฉ”ํŠธ๋ฆญ์—์„œ 45.13% ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ•˜์—ฌ ๊ธฐ์กด ์ตœ๊ณ  ์„ฑ๋Šฅ(SOTA) ๊ธฐ์ˆ ์„ ํฌ๊ฒŒ ์ดˆ๊ณผ.
  2. ๋ฏธ์ •๋ ฌ ์š”์†Œ ํƒ์ง€ ๋Šฅ๋ ฅ ๊ฐ•ํ™”: ๋ฏธ์ •๋ ฌ ๊ทธ๋ฆผ ์š”์†Œ ํƒ์ง€์—์„œ CIoU 4.90%, gIoU 4.52% ์„ฑ๋Šฅ ํ–ฅ์ƒ์œผ๋กœ ๋ณต์žกํ•œ ๊ทธ๋ฆผ์˜ ์šฐ์ˆ˜ํ•œ ์ดํ•ด ์ž…์ฆ.
  3. ์ฒซ ๋ฒˆ์งธ ์„ธ๋ฐ€ํ•œ ์ •๋ ฌ ๋ฐ์ดํ„ฐ์…‹: ์ž๋™ํ™” ํ”„๋กœ์„ธ์Šค์™€ ์ˆ˜๋™ ๊ฒ€์ฆ์„ ๊ฒฐํ•ฉํ•˜์—ฌ ๊ณ ํ’ˆ์งˆ Figure-seg ๋ฐ์ด์…‹ ๊ตฌ์ถ•, ๊ณผํ•™ ๊ทธ๋ฆผ์˜ ์„ธ๋ถ€ ํŒŒ์‹ฑ๊ณผ ์ •๋ ฌ ๋ถ„์„์— ํ•„์ˆ˜์ .
  4. ๋ฐฐ๊ฒฝ ์ง€์‹์˜ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ: ๊ทธ๋ฆผ ์š”์†Œ์˜ ๊ณต๊ฐ„-์˜๋ฏธ ํŠน์„ฑ์— ๊ด€ํ•œ ๋ฐฐ๊ฒฝ ์ง€์‹ ํ†ตํ•ฉ ์‹œ ์•ฝ 70% ๋ฉ”ํŠธ๋ฆญ์—์„œ ๊ธ์ •์  ์„ฑ๊ณผ ๋‹ฌ์„ฑ, ์ž์—ฐ ์ด๋ฏธ์ง€์™€ ๊ณผํ•™ ๊ทธ๋ฆผ์˜ ์ฐจ์ด ๊ฐ•์กฐ.

How

Figure 4

๊ทธ๋ฆผ 4: ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ ์ž‘์—… ๊ตฌํ˜„์„ ์œ„ํ•œ ์ „์ฒด ํ”„๋ ˆ์ž„์›Œํฌ. (a)๋Š” ๋‘ ๊ฐ€์ง€ ํ‰๊ฐ€ ๊ธฐ์ค€์„ ๋ณด์—ฌ์คŒ.

Figure 5

๊ทธ๋ฆผ 5: Chain-of-Attribute (CoA) ์ถ”๋ก  ํ”„๋กœ์„ธ์Šค์˜ ์ƒ์„ธ ์„ค๋ช….

Originality

Limitation & Further Study

Evaluation

Novelty: 4.5/5 Technical Soundness: 4/5 Significance: 4.5/5 Clarity: 4/5 Overall: 4.25/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ๊ณผํ•™ ๊ทธ๋ฆผ์˜ ์„ธ๋ฐ€ํ•œ ํ…์ŠคํŠธ-์ •๋ ฌ ๋ถ„์„์ด๋ผ๋Š” ๋ฏธ์ถฉ์กฑ ์—ฐ๊ตฌ ๊ณต๋ฐฑ์„ ๋ช…ํ™•ํ•˜๊ฒŒ ์ •์˜ํ•˜๊ณ , ์ƒˆ๋กœ์šด ์ž‘์—…, ๊ณ ํ’ˆ์งˆ ๋ฐ์ดํ„ฐ์…‹, ํšจ๊ณผ์ ์ธ MLLM ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ†ตํ•ด ์ฒด๊ณ„์ ์œผ๋กœ ํ•ด๊ฒฐํ•จ์œผ๋กœ์จ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ดํ•ด ๋ถ„์•ผ์˜ ์‹ค์งˆ์ ์ธ ๊ธฐ์—ฌ๋ฅผ ์ œ๊ณตํ•œ๋‹ค. ๋‹ค๋งŒ ๋„๋ฉ”์ธ-ํŠนํ™” ์ ์‘์„ฑ๊ณผ ๊ณ„์‚ฐ ํšจ์œจ์„ฑ ๊ฐœ์„ ์ด ์‹ค์ œ ์‘์šฉ์˜ ๊ด€๊ฑด์ด ๋  ๊ฒƒ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๊ณผํ•™ ๊ทธ๋ฆผ ๋ถ„์„ ๋ฐ ๋ฌด๊ฒฐ์„ฑ ๊ฒ€์ฆ์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ๊ด€๋ จ ๋ฐฉ๋ฒ•๋ก  ์—ฐ๊ตฌ์ด๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
323์˜ Figure Integrity Verification ๋ฐ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ figure-caption ์ •ํ•ฉ ๋ฐ์ดํ„ฐ๋Š” SciFIBench์˜ figure QA ๋ฒค์น˜๋งˆํฌ ๊ฐœ๋ฐœ์— ํ•„์ˆ˜์  ๊ธฐ๋ฐ˜์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๊ณผํ•™์  ์ด๋ฏธ์ง€, ๋„ํ‘œ ๋“ฑ Fig. ์กฐ์ž‘/๊ฒ€์ฆ์—์„œ ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ๊ณผ AI ์ƒ์„ฑ ์ด๋ฏธ์ง€ ๊ฐ๋ณ„์˜ ์ค‘์š”์„ฑ์„ ์ค‘์‹ฌ์ ์œผ๋กœ ๋‹ค๋ฃจ๋ฉฐ, ๋ฐ์ดํ„ฐ ๋ฌด๊ฒฐ์„ฑ ์œ„ํ˜‘ ๋…ผ์˜์— ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ณผํ•™ ๋ฌธ์„œ์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ดํ•ด ๋ฐ ๊ฒ€์ฆ์— ๋Œ€ํ•œ ๋Œ€์•ˆ์  ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ณผํ•™ ๋…ผ๋ฌธ์—์„œ ๊ทธ๋ฆผ๊ณผ ํ…์ŠคํŠธ์˜ ์ •๋ ฌ ๋ฐ ๊ฒ€์ฆ์„ ์œ„ํ•œ ์œ ์‚ฌํ•œ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Every Part Matters ๋…ผ๋ฌธ์€ ๊ณผํ•™ ๊ทธ๋ฆผ์˜ ์„ธ๋ฐ€ํ•œ ์ •๋ ฌ๊ณผ ์ง„์‹ค์„ฑ ๊ฒ€์ฆ์— ์ค‘์ ์„ ๋‘๋ฉฐ, MatViX์˜ ์ •๋ณด ์ถ”์ถœ ๊ด€์ ๊ณผ ์ƒํ˜ธ๋ณด์™„๋œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Multi-llm collaborative caption generation์€ ๊ณผํ•™ ๋ฌธ์„œ์˜ ๊ทธ๋ฆผ ์บก์…˜ ์ƒ์„ฑ์— ๋‹ค์–‘ํ•œ ํ˜‘์—…์  LLM ์ „๋žต์„ ์‹คํ—˜ํ•˜์—ฌ EPM ์ ‘๊ทผ๊ณผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ณผํ•™ ๋…ผ๋ฌธ์˜ ์‹œ๊ฐ ์š”์†Œ์™€ ํ…์ŠคํŠธ ๊ฐ„ ์ •ํ•ฉ์„ฑ ๊ฒ€์ฆ์„ ์œ„ํ•œ ์œ ์‚ฌํ•œ ์ ‘๊ทผ๋ฒ•์„ ๋‹ค๋ฃจ๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
MatViX๋Š” ๊ณผํ•™ ๋…ผ๋ฌธ ๋‚ด ๋ณต์žกํ•œ ๊ตฌ์กฐ์  ์ •๋ณด ์ถ”์ถœ์— ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์•ˆํ•˜์—ฌ Figure Integrity Verification ๋ฌธ์ œ๋ฅผ ์‹ค์ œ ์ •๋ณด ์ถ”์ถœ ์‘์šฉ์œผ๋กœ ํ™•์žฅํ•œ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •