LLMs Outperform Outsourced Human Coders on Complex Textual Analysis

์ €์ž: Vicente J. Bermejo, Andres Gago, Ramiro H. Gรกlvez, Nicolรกs Harari | ๋‚ ์งœ: 2024 | DOI: 10.2139/ssrn.5020034 📄 PDF


Essence

Figure 2

Figure 2 replicates Figure 1, presenting outcomes by task dif๏ฌculty for each article (see

๋ณธ ๋…ผ๋ฌธ์€ ์ŠคํŽ˜์ธ์–ด ๋‰ด์Šค ๊ธฐ์‚ฌ 210๊ฐœ๋ฅผ ๋Œ€์ƒ์œผ๋กœ GPT-3.5-turbo, GPT-4-turbo, Claude 3 Opus, Claude 3.5 Sonnet ๋“ฑ ์—ฌ๋Ÿฌ LLM์„ ์•„์›ƒ์†Œ์‹ฑ๋œ ์ธ๊ฐ„ ์ฝ”๋”์™€ ๋น„๊ตํ•˜์—ฌ, ๊ฐœ์ฒด๋ช… ์ธ์‹(NER)๋ถ€ํ„ฐ ์ •์น˜ ๋น„ํŒ ์‹๋ณ„๊นŒ์ง€ ๋‹ค์„ฏ ๊ฐ€์ง€ ๋ณต์žกํ•œ ์ž์—ฐ์–ธ์–ด์ฒ˜๋ฆฌ ์ž‘์—…์—์„œ LLM์ด ์ธ๊ฐ„ ์ฝ”๋”๋ฅผ ์ผ๊ด€๋˜๊ฒŒ ์ƒํšŒํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

Motivation

Achievement

Figure 2

Figure 2 replicates Figure 1, presenting outcomes by task dif๏ฌculty for each article (see

How

Figure 2

Figure 2 replicates Figure 1, presenting outcomes by task dif๏ฌculty for each article (see

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ LLM์ด ์•„์›ƒ์†Œ์‹ฑ๋œ ์ธ๊ฐ„ ์ฝ”๋”๋ฅผ ๋ช…ํ™•ํžˆ ๋Šฅ๊ฐ€ํ•˜๋ฉฐ ๋น„์šฉ ํšจ์œจ์ ์ธ ํ…์ŠคํŠธ ๋ถ„์„ ๋„๊ตฌ์ž„์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ž…์ฆํ•œ ์ค‘์š”ํ•œ ์—ฐ๊ตฌ๋‹ค. zero-shot learning์˜ ์‹ค์šฉ์„ฑ๊ณผ ๋‹ค์–ธ์–ด ์„ฑ๋Šฅ์„ ๊ฐ•์กฐํ•˜๋Š” ์ ์ด ์˜์˜ ์žˆ์œผ๋‚˜, ํ‘œ๋ณธ ํฌ๊ธฐ ๋ฐ ์ž‘์—… ๋ฒ”์œ„ ์ œํ•œ์ด ๋ณด์™„์ด ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
206์€ ์ธ๊ฐ„๊ณผ ํฌ๋ผ์šฐ๋“œ์›Œ์ปค๋ฅผ ๋Œ€์ƒ์œผ๋กœ LLM๊ณผ์˜ ์ž์—ฐ์–ด์ฒ˜๋ฆฌ ํ’ˆ์งˆ์„ ๋น„๊ตํ•˜์—ฌ, 511์˜ ์‹ฌ์ธต ๋‰ด์Šค ํ…์ŠคํŠธ ๋ถ„์„์—์„œ LLM vs ์ธ๊ฐ„์ฝ”๋” ๋น„๊ตํ‰๊ฐ€์— ์ด๋ก ยท์‹คํ—˜์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Agent-RAG ๋ฐฉ์‹์„ ํ™œ์šฉํ•œ ๋…ผ๋ฌธ ์งˆ์˜์‘๋‹ต์—์„œ LLM์˜ ๋ฌธ์„œ ์ดํ•ด๋ ฅ ํ•œ๊ณ„์™€ ๊ฐ€๋Šฅ์„ฑ์„ ๋น„๊ตํ•œ ์—ฐ๊ตฌ์—ฌ์„œ ์„œ๋กœ ๋‹ค์–‘ํ•œ ์ฝ”๋”ฉ/๋ถ„์„ ์—…๋ฌด ์ˆ˜ํ–‰๋Šฅ๋ ฅ ๋น„๊ต์— ๋„์›€์„ ์ค๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
LLM์ด ๋…ผ๋ฌธ ๋ฐ ์—ฐ๊ตฌ ํ‰๊ฐ€ ๊ณผ์ •์—์„œ ์ธ๊ฐ„ ์‹ฌ์‚ฌ์ž์— ๋น„ํ•ด ์งˆ์  ํ”ผ๋“œ๋ฐฑ์„ ์–ด๋–ป๊ฒŒ ์ œ๊ณตํ•˜๋Š”์ง€๋ฅผ ๋น„๊ต ๋ถ„์„ํ•˜์—ฌ, ์ธ๊ฐ„/AI ๋น„๊ต์˜ ๋‹ค๋ณ€ํ™”๋œ ์‹œ๊ฐ์„ ์ค€๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
543(MLCopilot)์€ LLM ๊ธฐ๋ฐ˜ ๋ณต์žกํ•œ ํ…์ŠคํŠธ ๋ถ„์„๊ณผ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ํ•ด์„์„ ์ง€์›ํ•˜๋Š” ์‹œ์Šคํ…œ์œผ๋กœ, 511์˜ ์ธ๊ฐ„์ฝ”๋” ๋Šฅ๊ฐ€ ์‚ฌ๋ก€์˜ ์‹ค์งˆ์  ๋„๊ตฌํ™”ยท์‘์šฉ ์˜ˆ์‹œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์—ฐ๊ตฌ์ž ์ง‘๋‹จ์—์„œ LLM๊ณผ ์ธ๊ฐ„์˜ ์ธ์‹ยท์„ฑ๊ณผ ๋น„๊ต ์กฐ์‚ฌ๋Š”, ์‹ค์ œ LLM๊ณผ ์ธ๊ฐ„ ์ „๋ฌธ๊ฐ€ ์ง‘๋‹จ ๊ฐ„ ์„ฑ๋Šฅ ์ฐจ์ด ๋ถ„์„๊ณผ ์ง๊ฒฐ๋œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
663๋ฒˆ ๋…ผ๋ฌธ์€ ์ž„์ƒ ์˜์‚ฌ๊ฒฐ์ •์—์„œ ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ LLM ์‹œ์Šคํ…œ์˜ ๊ฐ•ํ™” ํšจ๊ณผ๋ฅผ ๋ถ„์„ํ•˜์—ฌ, 511๋ฒˆ์˜ ์ธ๊ฐ„๋Œ€๋น„ LLM ์šฐ์›”์„ฑ ๋ถ„์„์„ ๋‹ค๋ฅธ ์˜์—ญ์œผ๋กœ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
AI ์ฝ”ํŒŒ์ผ๋Ÿฟ ๋“ฑ์˜ ์‹ค์ œ ์—ฐ๊ตฌ/์ฝ”๋”ฉ ํ™œ์šฉ ํ˜„ํ™ฉ ์ •๋Ÿ‰ ๋ฐ์ดํ„ฐ๋กœ, 511์—์„œ LLM์ด ์ธ๊ฐ„๋ณด๋‹ค ํ…์ŠคํŠธ ๋ถ„์„์—์„œ ์ผ๊ด€๋˜๊ฒŒ ์šฐ์ˆ˜ํ•˜๋‹ค๋Š” ๊ฒฐ๋ก ์˜ ์‹ค์งˆ์  ์ ์šฉ ์˜ˆ์‹œ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •