HuggingGPT: Solving AI Tasks with ChatGPT and its Friends in Hugging Face

์ €์ž: Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, Yueting Zhuang, A. Oh, T. Naumann, A. Globerson, K. Saenko, M. Hardt, S. Levine | ๋‚ ์งœ: 2023 | URL: https://proceedings.neurips.cc/paper_files/paper/2023/file/77c33e6a367922d003ff102ffb92b658-Paper-Conference.pdf 📄 PDF


Essence

Figure 1

Figure 1: Language serves as an interface for LLMs (e.g., ChatGPT) to connect numerous AI models

HuggingGPT๋Š” ChatGPT๋ฅผ ์ปจํŠธ๋กค๋Ÿฌ๋กœ ํ™œ์šฉํ•˜์—ฌ Hugging Face์˜ ๋‹ค์–‘ํ•œ AI ๋ชจ๋ธ๋“ค์„ ์ž๋™์œผ๋กœ ์„ ํƒํ•˜๊ณ  ์กฐ์œจํ•จ์œผ๋กœ์จ ๋ณต์žกํ•œ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ AI ์ž‘์—…์„ ํ•ด๊ฒฐํ•˜๋Š” LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์ด๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Overview of HuggingGPT. With an LLM (e.g., ChatGPT) as the core controller and

How

Figure 2

Figure 2: Overview of HuggingGPT. With an LLM (e.g., ChatGPT) as the core controller and

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: HuggingGPT๋Š” language๋ฅผ universal interface๋กœ ํ™œ์šฉํ•˜์—ฌ LLM๊ณผ ๋‹ค์–‘ํ•œ domain-specific ๋ชจ๋ธ์„ ํšจ๊ณผ์ ์œผ๋กœ ์—ฐ๊ฒฐํ•˜๋Š” ์ฐฝ์˜์ ์ด๊ณ  ์‹ค์šฉ์ ์ธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•˜๋ฉฐ, ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ณต์žก ์ž‘์—… ํ•ด๊ฒฐ๊ณผ AGI ๊ตฌํ˜„์— ์ค‘์š”ํ•œ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
AutoGen์€ ๋ฉ€ํ‹ฐ-์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ LLM ์ž‘์—… ์ž๋™ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, HuggingGPT์˜ ์ปจํŠธ๋กค๋Ÿฌ-ํˆด ๋ถ„์‚ฐ ๊ตฌ์กฐ์˜ ๊ทผ๊ฐ„์„ ํ˜•์„ฑํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
464๋Š” LLM ๊ธฐ๋ฐ˜ ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ ์ „๋ฐ˜์„ ์„œ๋ฒ ์ดํ•˜๋ฉฐ, 412์˜ ๊ตฌ์กฐ์  ์„ค๊ณ„์— ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
412๋ฒˆ ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ ์—์ด์ „ํŠธ ํ˜‘์—… ๊ตฌ์กฐ(HuggingGPT)๋ฅผ ์ œ์•ˆํ•˜์—ฌ, 735๋ฒˆ์˜ ๋ฉ€ํ‹ฐํˆด ์—ฐ๊ณ„ ๋ฐ ์‹คํ–‰ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ๊ทผ๊ฐ„์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
HuggingGPT ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ ์™ธ๋ถ€ ๋„๊ตฌ์™€ LLM ํ†ตํ•ฉ์„ ํ†ตํ•œ Agentic Framework์˜ ์ดˆ๊ธฐ ๊ฐœ๋…์„ ์ œ์‹œํ•˜์—ฌ ToolUniverse์˜ ์˜คํ”ˆ์†Œ์Šค ๋„๊ตฌ์ƒํƒœ๊ณ„ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ์— ์ด๋ก ์  ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
HuggingGPT ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ๋ฉ€ํ‹ฐ๋„๊ตฌยท์—์ด์ „ํŠธ ํ˜‘์—… ์ž๋™ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, 137 ๋…ผ๋ฌธ์˜ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜ ๊ฐœ๋…์˜ ๊ธฐ์ˆ ์  ๊ธฐ๋ฐ˜์„ ๋งˆ๋ จํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
412๋ฒˆ ๋…ผ๋ฌธ์€ HuggingGPT์™€ ๊ด€๋ จํ•˜์—ฌ ๋Œ€๊ทœ๋ชจ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ž‘์—… ์ž๋™ํ™” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ EAA์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋„๊ตฌํ†ตํ•ฉ๊ณผ ๋น„๊ต๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
412๋ฒˆ ๋…ผ๋ฌธ์€ LLM ๊ธฐ๋ฐ˜ ๋‹ค์–‘ํ•œ ๊ณผํ•™์  ๋„๊ตฌ ์กฐํ•ฉ์„ ๋‹ค๋ฃจ์–ด, ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๋ฐ ํˆด ํ™œ์šฉ์˜ ์›๋ฆฌ์  ๋ฐฐ๊ฒฝ์„ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
HuggingGPT(412)๋Š” ์ด์ข… ์—์ด์ „ํŠธ ์˜ค์ผ€์ŠคํŠธ๋ ˆ์ด์…˜์„ ์œ„ํ•œ LLM ์ค‘์‹ฌ ์ธํ„ฐํŽ˜์ด์Šค ๊ฐœ๋…์„ ์ œ๊ณตํ•˜์—ฌ 3126์˜ Eywa ์„ค๊ณ„์— ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Hiagent ๋…ผ๋ฌธ์€ ๋Œ€๊ทœ๋ชจ ์—์ด์ „ํŠธ์˜ ์žฅ๊ธฐ ๋ฌธ์ œํ•ด๊ฒฐ ์‹œ ๊ณ„์ธตํ˜• ๋ฉ”๋ชจ๋ฆฌ ๊ด€๋ฆฌ๋กœ ์ž‘์—… ํšจ์œจ์„ ๋†’์ด๋Š” ๋Œ€์•ˆ์  ๊ตฌ์กฐ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‹ค๋ถ„์•ผ ๊ณผํ•™(Astronomy, Bio ๋“ฑ)์—์„œ LLM์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ๋ณต์žกํ•œ ๋ฐ์ดํ„ฐ ํ•ด์„์ด ์–ด๋–ป๊ฒŒ ์ด๋ฃจ์–ด์ง€๋Š”์ง€ ๋น„๊ต์  ์‚ฌ๋ก€๋กœ ์ ํ•ฉํ•˜๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
HuggingGPT ๋…ผ๋ฌธ๋„ ๋‹ค์–‘ํ•œ API์™€ ํˆด์„ ํ™œ์šฉํ•ด ๋‹ค์–‘ํ•œ ์ž‘์—…์„ ์ž๋™ํ™”ํ•˜๋Š” LLM-์—์ด์ „ํŠธ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ AnyTool๊ณผ ๋น„๊ต ์—ฐ๊ตฌ์— ์ ํ•ฉํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
HuggingGPT๋Š” ChatGPT ๋“ฑ LLM์ด ๋‹ค์–‘ํ•œ ๋„๊ตฌ ๋ฐ ๋ถ„์‚ฐํ˜• ์—์ด์ „ํŠธ๋ฅผ ๋™์›ํ•˜๋Š” ๋ฐฉ์‹์„ ์ œ์‹œํ•˜๋ฉฐ, WebWatcher์˜ ์—์ด์ „ํŠธ ๊ตฌ์กฐ์™€ ๋น„๊ตํ•ด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
412๋ฒˆ ๋…ผ๋ฌธ์€ ์—ฌ๋Ÿฌ LLM์„ ์—ฐ๊ณ„ยทํ˜‘์—…ํ•ด ๋ณต์žกํ•œ ์ž‘์—…์„ ํ•ด๊ฒฐํ•˜๋Š” HuggingGPT ์‹œ์Šคํ…œ ์†Œ๊ฐœ๋กœ, 499๋ฒˆ ๋…ผ๋ฌธ์˜ multi-tool integration ํŒจ๋Ÿฌ๋‹ค์ž„์„ ํ™•์žฅํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
LLM Agents Making Agent Tools ๋…ผ๋ฌธ์€ LLM์˜ ๋„๊ตฌ ์กฐํ•ฉ ๋ฐ ์ž๋™ํ™” ๊ธฐ๋ฒ•์„ ์ถ”๊ฐ€ ํ™•์žฅํ•˜์—ฌ, HuggingGPT ์‹ค์šฉํ™” ์ดํ›„์˜ ์—ฐ๊ตฌ๋ฅผ ์‚ดํŽด๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
352๋ฒˆ ๋…ผ๋ฌธ์€ ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ์ž์œจ ๊ณผํ•™ ์‹œ๋Œ€์˜ ํ•ต์‹ฌ ์›๋ฆฌ์™€ ์ง„ํ™” ๋ฐฉํ–ฅ์„ ๋‹ค๋ฃจ๋ฉฐ, 412๋ฒˆ HuggingGPT์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์—์ด์ „ํŠธ ์กฐ์œจ ๊ฒฝํ—˜์ด ๋Œ€ํ˜•ํ™”๋  ์ˆ˜ ์žˆ๋‹ค๋Š” ๊ฐ€๋Šฅ์„ฑ์„ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Foundation-Model Surrogates Enable Data-Efficient Active Learning(346)์€ ๋‹ค์–‘ํ•œ ๋ชจ๋ธ ํ†ตํ•ฉ ๊ธฐ๋ฒ•์„ ์‹คํ—˜์  ๊ณผํ•™ ๋ฌธ์ œ์— ์ ์šฉํ•˜๋ฉฐ, 412์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์—์ด์ „ํŠธ ๊ฐœ๋…์„ ๊ณผํ•™ ๋ฐœ๊ฒฌ์— ํ™•์žฅํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
412๋ฒˆ ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ AI ์ž‘์—… ๋ฌธ์ œ์— HuggingGPT ์—์ด์ „ํŠธ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ ์šฉํ•œ ์‚ฌ๋ก€๋กœ, 205์˜ ํ†ตํ•ฉ์  SW๊ฐœ๋ฐœ ํ”„๋กœ์„ธ์Šค ๊ฐœ๋…์„ ์‹ค์ œ๋กœ ๋‹ค๋ฅธ ๊ณผํ•™๋ถ„์•ผ์— ํ™•์žฅํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
HuggingGPT(412)๋Š” LLM ๊ธฐ๋ฐ˜ ๋ฉ€ํ‹ฐ์—์ด์ „ํŠธ์˜ ์‹ค์ œ ๊ตฌํ˜„ ์‚ฌ๋ก€๋กœ, 464์˜ ๊ฐœ๋…์  ๋ถ„๋ฅ˜ ๋ฐ ํ•œ๊ณ„ ๋ถ„์„๊ณผ ํ˜„์‹ค์  ์—ฐ๊ฒฐ๊ณ ๋ฆฌ๊ฐ€ ๋œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
412๋ฒˆ HuggingGPT๋Š” ์—ฌ๋Ÿฌ AI ์ž‘์—…์„ ์ž๋™ํ™”ํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋กœ, 849๋ฒˆ์—์„œ ๊ฐ•์กฐํ•œ GUI ์—์ด์ „ํŠธ์™€ ์‹ค์ œ ๋„๊ตฌ ํ†ตํ•ฉ ๋ฐ ์‘์šฉ ์‚ฌ๋ก€๋ฅผ ์—ฐ๊ณ„ํ•ด ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •