Investigating zero-and few-shot generalization in fact verification

์ €์ž: Liangming Pan, Yunxiang Zhang, Min-Yen Kan | ๋‚ ์งœ: 2023 | DOI: N/A 📄 PDF


Essence

๋ณธ ๋…ผ๋ฌธ์€ ์‚ฌ์‹ค ๊ฒ€์ฆ(fact verification) ๋ชจ๋ธ์˜ ์˜์—ญ ๊ฐ„ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ์กฐ์‚ฌํ•œ ์ฒซ ๋ฒˆ์งธ ์ข…ํ•ฉ ์—ฐ๊ตฌ์ด๋‹ค. 11๊ฐœ FV ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ๊ตฌ์„ฑ๋œ ๋ฒค์น˜๋งˆํฌ๋ฅผ ๊ตฌ์ถ•ํ•˜๊ณ  zero-shot ๋ฐ few-shot ์„ค์ •์—์„œ์˜ ์ „์ด ์„ฑ๋Šฅ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๋ถ„์„ํ•˜๋ฉฐ, ๋„๋ฉ”์ธ ํŠนํ™” ์‚ฌ์ „ํ•™์Šต๊ณผ ๋ฐ์ดํ„ฐ ์ฆ๊ฐ•์„ ํ†ตํ•œ ๊ฐœ์„  ๋ฐฉ์•ˆ์„ ์ œ์‹œํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: tSNE plot of [CLS] representations of each

How

Figure 2

Figure 2: Confusion matrices (normalized over columns) of generated claims on four datasets. The desired label

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ์‚ฌ์‹ค ๊ฒ€์ฆ์˜ ์˜์—ญ ๊ฐ„ ์ผ๋ฐ˜ํ™”๋ฅผ ์ฒ˜์Œ์œผ๋กœ ์ฒด๊ณ„์ ์œผ๋กœ ์กฐ์‚ฌํ•œ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ๋กœ, ํ†ต์ผ๋œ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ•๊ณผ ์‹ค์ฆ์  ๋ถ„์„์„ ํ†ตํ•ด ์‹ค๋ฌด์  ๊ฐ€์น˜ ๋†’์€ ํ†ต์ฐฐ์„ ์ œ๊ณตํ•œ๋‹ค. ๋‹ค๋งŒ ๋ชจ๋ธ์˜ ๋ฒ”์œ„๊ฐ€ ์ œํ•œ์ ์ด๊ณ  ์ œ์•ˆ๋œ ๊ฐœ์„  ๋ฐฉ๋ฒ•๋“ค์˜ ์‹ค์šฉ์„ฑ ์ œ์•ฝ์ด ์žˆ์œผ๋ฏ€๋กœ, ํ–ฅํ›„ ๊ฐ•ํ™”๋œ ๊ธฐ๋ฒ•๊ณผ ๋ชจ๋˜ LLM์„ ํ™œ์šฉํ•œ ํ™•์žฅ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Large Language Models are Zero Shot Hypothesis Proposers ๋…ผ๋ฌธ์€ zero-shot ๋Šฅ๋ ฅ์„ ๊ตฌ์ฒด์ ์œผ๋กœ ํ‰๊ฐ€ํ•ด 441์˜ ํŒฉํŠธ ์ฒดํฌ zero-shot/์ „์ด ๋Šฅ๋ ฅ ๋ถ„์„์˜ ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์„ค๋ช…๊ฐ€๋Šฅํ•œ ํด๋ ˆ์ž„ ๊ฒ€์ฆ ๋ฐ ์ฆ๊ฑฐ ์ถ”์ถœ์— ๊ด€ํ•œ ์ฒด๊ณ„์  ๋ฐฉ๋ฒ•๋ก ์„ ์ œ๊ณตํ•ด ๋„๋ฉ”์ธ ๊ฐ„ ์ผ๋ฐ˜ํ™” ํƒ๊ตฌ์˜ ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Factkg: Fact verification via reasoning on knowledge graphs ๋…ผ๋ฌธ์€ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ์ถ”๋ก ์„ ํ†ตํ•ด ๋„๋ฉ”์ธ ๊ฐ„ ์‚ฌ์‹ค ๊ฒ€์ฆ ์ผ๋ฐ˜ํ™” ์„ฑ๋Šฅ ํ–ฅ์ƒ์— ์ ‘๊ทผ, ๋ณธ ๋…ผ๋ฌธ์˜ ์ „์ด ํ•™์Šต ๋ฌธ์ œ์™€ ๋Œ€์กฐ์ ์œผ๋กœ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ”„๋กœ๊ทธ๋žจ ๊ธฐ๋ฐ˜ ๋ณตํ•ฉ ์ฆ๊ฑฐ fact-checking ๋ฐฉ์‹์œผ๋กœ, ๊ธฐ์กด fact verification๊ณผ reasoning integration์˜ ์ฐจ์ด๋ฅผ ๋ถ„์„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
claim verification ๋ชจ๋ธ์˜ ๊ฐ•๊ฑด์„ฑ ๋ฐ fact detection ์ ‘๊ทผ๋ฒ•์„ ์ œ๊ณตํ•˜๋Š” ๋…ผ๋ฌธ์œผ๋กœ ์„œ๋กœ ๋‹ค๋ฅธ fact verification ์ „๋žต์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
fact verification์—์„œ zero- and few-shot generalization ๋ฌธ์ œ ๋ถ„์„์„ ํ†ตํ•ด, ๊ณผํ•™์  ์ฃผ์žฅ ๊ฒ€์ฆ๊ณผ ๋ชจ๋ธ์˜ ์ „์ด ๊ฐ€๋Šฅ์„ฑ ํ•œ๊ณ„๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
441๋ฒˆ ๋…ผ๋ฌธ์€ ๊ณผํ•™ ์‚ฌ์‹ค ๊ฒ€์ฆ์—์„œ zero/few-shot ์ผ๋ฐ˜ํ™” ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ์–ด, 172๋ฒˆ์ด ๋‹ค๋ฃจ๋Š” ์ž์—ฐ ๋ฐœ์ƒ ์˜ˆ/์•„๋‹ˆ์˜ค ์งˆ์˜์˜ ์ผ๋ฐ˜ํ™” ๋„์ „๊ณผ ์—ฐ๊ณ„๋ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์•ฝํ•œ ๊ฐ๋… ๊ธฐ๋ฐ˜ ์‚ฌ์‹ค ๊ฒ€์ฆ ๋ฐฉ๋ฒ•์ด zero/few-shot ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ๊ฐ•ํ™” ์‹คํ—˜์— ํ™œ์šฉ๋˜์–ด, ๋ฐœ์ „ ๋ฐฉํ–ฅ์„ ํƒ์ƒ‰ํ•˜๋Š” ๋ฐ ์ข‹์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
fact-checking task์—์„œ ๋ฐ˜์ฆ ์ฆ๊ฑฐ ๊ฒฐ์—ฌ์˜ ํ•œ๊ณ„์ ์„ ์ง‘์ค‘ ํƒ๊ตฌํ•˜์—ฌ, zero/few-shot ์ผ๋ฐ˜ํ™” ํ‰๊ฐ€์™€ ์‹œ๋„ˆ์ง€ ํšจ๊ณผ๊ฐ€ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์–ธ์ˆ˜ํผ๋ฐ”์ด์ฆˆ๋“œ ์‚ฌ์ „ํ•™์Šต๊ณผ ์–ธ์–ด๋ชจ๋ธ์„ ํ™œ์šฉํ•œ ์—ญ๋Ÿ‰ ํ™•์žฅ, ๊ณผํ•™์  ํŒฉํŠธ ๊ฒ€์ฆ ํƒœ์Šคํฌ ๊ฐ„ ์˜ํ–ฅ๋ ฅ์„ ๋น„๊ตํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ReSearch ๋…ผ๋ฌธ์€ LLM์˜ ๊ฒ€์ƒ‰ ๊ธฐ๋ฐ˜ ๊ฐ•ํ™” ํ•™์Šต์„ ํ†ตํ•œ ์ถ”๋ก  ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ํ–ฅ์ƒ์„ ๋‹ค๋ฃจ์–ด 441์—์„œ ์ œ๊ธฐํ•œ ์ผ๋ฐ˜ํ™” ํ•œ๊ณ„ ๊ทน๋ณต์— ์‹ค์งˆ์  ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
์ธ์ฒด ์ž์„ธ ์ถ”์ • ๋ถ„์•ผ์—์„œ์˜ zero/few-shot ํ•™์Šต ์„ฑ๊ณผ๋ฅผ fact verification task์— ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ ์žˆ๋Š” ์˜ˆ์‹œ๋กœ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
441์€ ์‚ฌ์‹ค ๊ฒ€์ฆ์—์„œ ์ œ๋กœ์ƒท๊ณผ ํ“จ์ƒท ์ผ๋ฐ˜ํ™”์˜ ํ•œ๊ณ„ ๋ฐ ๊ฐ•์ ์„ ๋‹ค๋ฃจ๋ฉฐ, 859์˜ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์„ฑ๋Šฅ ํ•ด์„์— ์‹œ์‚ฌ์ ์„ ์ค๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •