If in a Crowdsourced Data Annotation Pipeline, a GPT-4 | Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems

์ €์ž: Zeyu He, Chieh-Yang Huang, Chien-Kuang Cornelia Ding, Shaurya Rohatgi, Ting-Hao Kenneth Huang | ๋‚ ์งœ: | DOI: 10.1145/3613904.3642834 📄 PDF


Essence

Figure 4

Figure 4: Aggregation Methods for All Workers, Exclude-By-Worker, and Exclude-By-Batch. Among the various models and

GPT-4์™€ ์ตœ์ ํ™”๋œ ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ํŒŒ์ดํ”„๋ผ์ธ์˜ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๋Šฅ๋ ฅ์„ ๋น„๊ตํ•œ ์—ฐ๊ตฌ๋กœ, GPT-4๊ฐ€ ๊ฐœ๋ณ„ ์„ฑ๋Šฅ์—์„œ ์šฐ์ˆ˜ํ•˜์ง€๋งŒ ๋ผ๋ฒจ ์ง‘๊ณ„(Label Aggregation)๋ฅผ ํ†ตํ•ด ํฌ๋ผ์šฐ๋“œ ๋ผ๋ฒจ๊ณผ ๊ฒฐํ•ฉํ•˜๋ฉด ๋” ๋†’์€ ์ •ํ™•๋„๋ฅผ ๋‹ฌ์„ฑํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์—ฌ์คŒ.

Motivation

Achievement

Figure 4

Figure 4: Aggregation Methods for All Workers, Exclude-By-Worker, and Exclude-By-Batch. Among the various models and

How

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ๊ธฐ์กด GPT-4 vs ํฌ๋ผ์šฐ๋“œ ์›Œ์ปค ๋น„๊ต ์—ฐ๊ตฌ์˜ ๋ฐฉ๋ฒ•๋ก ์  ๋ฌธ์ œ์ ์„ ์ถฉ์‹คํžˆ ํ•ด๊ฒฐํ•˜๋ฉด์„œ, ์ตœ์ ํ™”๋œ ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ •ํ™•์„ฑ์„ ๊ฒ€์ฆํ•˜๊ณ  GPT-4์™€์˜ ํ•˜์ด๋ธŒ๋ฆฌ๋“œ ์ ‘๊ทผ์ด ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ์ œ๊ณตํ•  ์ˆ˜ ์žˆ์Œ์„ ์ž…์ฆํ–ˆ๋‹ค๋Š” ์ ์—์„œ ๋†’์€ ํ•™์ˆ ์  ๊ฐ€์น˜๋ฅผ ๊ฐ€์ง. ํŠนํžˆ LLM ์‹œ๋Œ€ ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ์˜ ์ƒˆ๋กœ์šด ์—ญํ• ์„ ์ œ์‹œํ•œ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ์ž„.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Towards effective extraction and evaluation of factual claims ๋…ผ๋ฌธ์€ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง/์ถ”์ถœ ์ž๋™ํ™”์˜ ์‹ ๋ขฐ์„ฑยทํšจ๊ณผ์„ฑ ๊ธฐ์ค€ ๋ฐ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•ด, 905์˜ LLM ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ ์ •ํ™•๋„ ๋…ผ์˜๋ฅผ ๋’ท๋ฐ›์นจํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Wordcraft(886)์™€ ๋‹ฌ๋ฆฌ, 905๋Š” ๋ฐ์ดํ„ฐ ์–ด๋…ธํ…Œ์ด์…˜์˜ ์ •ํ™•๋„์™€ ํšจ์œจ์„ฑ ์ธก๋ฉด์—์„œ ์ธ๊ฐ„๊ณผ GPT-4 ํ˜‘์—…์„ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
339๋Š” ์ž๋™ ์ˆ˜ํ•™ ์ •๋ฆฌ ์ฆ๋ช…์šฉ ๋„์ „์  ๋ฐ์ดํ„ฐ์…‹์„ ์ œ๊ณตํ•˜์—ฌ, 905์˜ AIยทํฌ๋ผ์šฐ๋“œ ์†Œ์‹ฑ ๋ฐ์ดํ„ฐ ํ’ˆ์งˆ ๋…ผ์˜์™€ ์œ ํ˜•์— ๊ด€ํ•œ ๋Œ€์กฐ์ ์ด ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
905๋ฒˆ ๋…ผ๋ฌธ์€ AI ๋ฐ ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ๊ธฐ๋ฐ˜ ๋ฐ์ดํ„ฐ ์ฃผ์„ ํŒŒ์ดํ”„๋ผ์ธ์˜ ์ •ํ™•์„ฑ๊ณผ ์‹ ๋ขฐ๋„ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ค„, AI๊ธฐ๋ฐ˜ ๋ฌธํ—Œ๊ฒ€์ƒ‰์˜ ์‹ค์ œ ์ ์šฉ ํ•œ๊ณ„์™€ ํ•จ๊ป˜ ์ฝ๊ธฐ ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
206๋ฒˆ ๋…ผ๋ฌธ์€ ํฌ๋ผ์šฐ๋“œ์†Œ์‹ฑ ํ…์ŠคํŠธ ์–ด๋…ธํ…Œ์ด์…˜์„ ChatGPT ๊ธฐ๋ฐ˜์œผ๋กœ ์ž๋™ํ™”ํ•˜์—ฌ, 905๋ฒˆ์˜ ์„ฑ๋Šฅ๋น„๊ต ์—ฐ๊ตฌ์™€ ์ง์ ‘ ์—ฐ๊ด€๋ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
317๋ฒˆ ๋…ผ๋ฌธ์€ NLI ์„ฑ๋Šฅ์„ ์™ธ๋ถ€ ์ง€์‹ ๊ทธ๋ž˜ํ”„๋กœ ๋†’์ด๋Š” ๋ฐฉ๋ฒ•์„ ์ œ์•ˆํ•˜์—ฌ, LLM์˜ ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๋ฐ ํŒ๋ณ„๋ ฅ ๊ฐœ์„ ์„ ๋‹ค๋ฃฌ 905๋ฒˆ ๋…ผ๋ฌธ์˜ ํ›„์† ์—ฐ๊ตฌ๋กœ ์—ฐ๊ฒฐ๋œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๋Œ€๊ทœ๋ชจ ์–ธ์–ด ๋ชจ๋ธ์˜ ์‹ ๋ขฐ์„ฑ ํ‰๊ฐ€๋ผ๋Š” ์ ์—์„œ, ๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ํ’ˆ์งˆ๊ณผ LLM ์‹ ๋ขฐ์„ฑ ๋ฌธ์ œ๋ฅผ ์—ฐ๊ด€์ง€์–ด ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
๋ฐ์ดํ„ฐ ๋ผ๋ฒจ๋ง ๊ณผ์ •์—์„œ์˜ LLM ์‹ ๋ขฐ์„ฑ ํ‰๊ฐ€ ๋ฐ ํ˜„์‹ค ์‘์šฉ์ƒ์˜ ๋„์ „์ ์„ LLM ํŠธ๋Ÿฌ์ŠคํŠธํ”„๋ ˆ์ž„๊ณผ ์—ฐ๊ฒฐํ•ด ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •