Wrong-of-Thought: An Integrated Reasoning Framework with Multi-Perspective Verification and Wrong Information

์ €์ž: Yongheng Zhang, Qiguang Chen, Jingxuan Zhou, Peng Wang, Jiasheng Si, Jin Wang, Wenpeng Lu, Libo Qin | ๋‚ ์งœ: 2024 | DOI: arXiv:2410.04463 📄 PDF


Essence

Figure 1

๊ทธ๋ฆผ 1: ๊ธฐ์กด ๋‹ค์ค‘ ์‚ฌ๊ณ  ํ†ตํ•ฉ ๋ฐฉ๋ฒ•(a)์€ ๋‹จ์ผ ๊ฒ€์ฆ๋งŒ ์‚ฌ์šฉํ•˜๊ณ  ์˜ค๋ฅ˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜์ง€ ์•Š๋Š” ๋ฐ˜๋ฉด, WoT(b)๋Š” ๋‹ค์ค‘ ๊ด€์  ๊ฒ€์ฆ๊ณผ ์˜ค๋ฅ˜ ์ •๋ณด ํ™œ์šฉ์„ ์ œ๊ณตํ•œ๋‹ค.

๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ(LLM)์˜ ์ถ”๋ก  ์„ฑ๋Šฅ์„ ํ–ฅ์ƒ์‹œํ‚ค๊ธฐ ์œ„ํ•ด ๋‹ค์ค‘ ๊ด€์ ์—์„œ ๊ฒ€์ฆํ•˜๊ณ  ์ด์ „ ์˜ค๋ฅ˜ ์ •๋ณด๋ฅผ ํ™œ์šฉํ•˜๋Š” WoT(Wrong-of-Thought) ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•œ๋‹ค. ๊ธฐ์กด XoT์˜ ๋‹จ์ผ ๊ฒ€์ฆ ๋ฐฉ์‹๊ณผ ์˜ค๋ฅ˜ ์ •๋ณด ๋ฌด์‹œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜์—ฌ 8๊ฐœ ๋ฐ์ดํ„ฐ์…‹๊ณผ 5๊ฐœ LLM์—์„œ ์šฐ์ˆ˜ํ•œ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

Motivation

Achievement

Figure 3

๊ทธ๋ฆผ 3: WoT ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ตฌ์กฐ. ๊ณ„ํš ๋ฐ ํ’€์ด, ๋‹ค์ค‘ ๊ด€์  ๊ฒ€์ฆ, ์˜ค๋ฅ˜ ์ •๋ณด ํ™œ์šฉ์˜ ์„ธ ๊ฐ€์ง€ ํ•ต์‹ฌ ๋ชจ๋“ˆ๋กœ ๊ตฌ์„ฑ๋œ๋‹ค.

  1. ์ข…ํ•ฉ์  ์„ฑ๋Šฅ ํ–ฅ์ƒ: 8๊ฐœ ๋ฒค์น˜๋งˆํฌ ๋ฐ์ดํ„ฐ์…‹(GSM8K, GSM-Hard, Algebra, MultiArith ๋“ฑ)๊ณผ 5๊ฐœ LLM(Mistral-7B, Qwen-7B/14B, Gemini-1.0-Pro, GPT-3.5-Turbo)์—์„œ ๋ชจ๋“  ๊ธฐ์กด ๋ฒ ์ด์Šค๋ผ์ธ์„ ๋Šฅ๊ฐ€
  2. ์–ด๋ ค์šด ๊ณ„์‚ฐ ๋ฌธ์ œ ํ•ด๊ฒฐ ๋Šฅ๋ ฅ: ํŠนํžˆ ๋ณต์žกํ•œ ์ˆ˜ํ•™์  ์ถ”๋ก ์ด ํ•„์š”ํ•œ ๋ฌธ์ œ์—์„œ ํƒ์›”ํ•œ ์„ฑ๋Šฅ ์ž…์ฆ
  3. ์˜ค๋ฅ˜ ์ •๋ณด ํ™œ์šฉ์˜ ํšจ๊ณผ์„ฑ: ์ž˜๋ชป๋œ ์ถ”๋ก  ์ •๋ณด๋ฅผ ๋‹ค์‹œ ์ œ์‹œํ•จ์œผ๋กœ์จ LLM์ด ์œ ์‚ฌํ•œ ์˜ค๋ฅ˜๋ฅผ ๋ฐ˜๋ณตํ•  ํ™•๋ฅ  ๊ฐ์†Œ

How

Figure 2

๊ทธ๋ฆผ 2: XoT ํ”„๋ ˆ์ž„์›Œํฌ. ์ถ”๋ก  ๋ฐฉ๋ฒ• ์„ ํƒ ํ›„ ์–ด์„ค์…˜ ๊ฒ€์ฆ์„ ํ†ตํ•ด ํŒ๋‹จํ•˜๊ณ , ์˜ค๋ฅ˜ ์‹œ ๋‹ค๋ฅธ ๋ฐฉ๋ฒ•์œผ๋กœ ์ „ํ™˜ํ•˜์—ฌ ์žฌ์‹œ์ž‘ํ•œ๋‹ค.

๋‹ค์ค‘ ๊ด€์  ๊ฒ€์ฆ(Multi-Perspective Verification)

$$\hat{V} = \arg\max_{V_t \in V} \sum_{t=1}^{N} \sum_{R \in M_i} \mathbb{1}(V_t = R)$$

์˜ค๋ฅ˜ ์ •๋ณด ํ™œ์šฉ(Wrong Information Utilization)

$$\hat{R} = \arg\max_{R \in M_i} P(R|Q, I, WI)$$

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: WoT๋Š” ๋‹จ์ˆœํ•˜์ง€๋งŒ ํšจ๊ณผ์ ์ธ ๊ฐœ์„ ์ฑ…์„ ํ†ตํ•ด LLM์˜ ์ถ”๋ก  ์„ฑ๋Šฅ์„ ์ผ๊ด€๋˜๊ฒŒ ํ–ฅ์ƒ์‹œํ‚ค๋ฉฐ, ๊ด‘๋ฒ”์œ„ํ•œ ์‹คํ—˜์œผ๋กœ ๊ทธ ์œ ํšจ์„ฑ์„ ์ž…์ฆํ–ˆ๋‹ค. ๋‹ค๋งŒ ๊ฒ€์ฆ ์˜ค๋ฒ„ํ—ค๋“œ์™€ ์˜ค๋ฅ˜ ์ •๋ณด ํ™œ์šฉ์˜ ์‹ฌํ™” ๋ฐฉ์•ˆ์— ๋Œ€ํ•œ ์ถ”๊ฐ€ ์—ฐ๊ตฌ๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Selfcheck ๋…ผ๋ฌธ์€ LLM์ด ์ƒ์„ฑํ•œ ์ถ”๋ก ๊ณผ์ •์„ ์Šค์Šค๋กœ ๊ฒ€์ฆํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๊ฐœ๋ฐœํ•˜์—ฌ, Wrong-of-Thought์˜ ๋‹ค์ค‘ ๊ด€์  ๊ฒ€์ฆ ์•„์ด๋””์–ด ์ดˆ๊ธฐ ์—ฐ๊ตฌ๋กœ ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ž๊ธฐ๋น„ํŒ(Self-critique) ๊ธฐ๋ฐ˜ ์ถ”๋ก  ์ ˆ์ฐจ๋ฅผ ์†Œ๊ฐœํ•˜๋ฉฐ Wrong-of-Thought์˜ ๋‹ค์ค‘ ๊ด€์  ๊ฒ€์ฆ๊ณผ ํƒ€๋‹น์„ฑ ํ‰๊ฐ€ ์ธก๋ฉด์—์„œ ๋ฐฐ๊ฒฝ์ด ๋ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
790์€ LLM์˜ ์ž๊ธฐ ๋””๋ฒ„๊น… ํ•™์Šต์„ ๋‹ค๋ฃจ์–ด, 887์˜ ๋‹ค์ค‘ ๊ด€์  ๊ฒ€์ฆ ํ”„๋ ˆ์ž„์›Œํฌ์— ์‹ค์งˆ์  ๊ฐ•ํ™”๋ฒ•์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Critique-GRPO ๋…ผ๋ฌธ์€ ์ž์—ฐ์–ด ๋น„ํŒ ๋ฐ ์ž๊ธฐ ๋ถ„์„์„ ํ†ตํ•œ LLM ์ถ”๋ก  ๊ฐ•ํ™” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ, WoT ํ”„๋ ˆ์ž„์›Œํฌ์˜ ์‹ค์งˆ ์ ์šฉ ์‚ฌ๋ก€๋กœ ์ฐธ๊ณ ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
Large Language Models are Zero Shot Hypothesis Proposers ๋…ผ๋ฌธ์€ ๋‹ค์ค‘ ๊ด€์ /์˜ค๋ฅ˜ ํ”ผ๋“œ๋ฐฑ ์—†์ด๋„ LLM์ด ์ฐฝ์˜์  ์ถ”๋ก ์„ ์ˆ˜ํ–‰ํ•  ์ˆ˜ ์žˆ๋‹ค๋Š” ์ ์„ ๊ฐ•์กฐํ•˜๋ฉฐ, Wrong-of-Thought(887)์˜ ๋‹ค์ค‘ ๊ฒ€์ฆ ์ „๋žต ์ ‘๊ทผ๊ณผ ๋Œ€์กฐ๋œ๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
LLM์˜ ์ž๊ธฐ์ˆ˜์ •ยท์ž๊ธฐ๊ฒ€์ฆ์˜ ํ•œ๊ณ„์™€ XoT ๋“ฑ ๋‹ค์–‘ํ•œ ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ๋น„ํŒ์  ์‹œ๊ฐ์„ ์ œ์‹œํ•ด Wrong-of-Thought ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ•„์š”์„ฑ์„ ๋ถ€๊ฐํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •