ChemToolAgent: The Impact of Tools on Language Agents for Chemistry Problem Solving

์ €์ž: Botao Yu, Frazier N. Baker, Ziru Chen, Garrett Herb, Boyu Gou | ๋‚ ์งœ: 2024 | URL: https://arxiv.org/abs/2411.07228 📄 PDF


Essence

Figure 1

Figure 1: Our ChemToolAgent framework. Upon receiv-

ChemToolAgent๋Š” 29๊ฐœ์˜ ๋„๊ตฌ๋ฅผ ํ†ตํ•ฉํ•œ ํ™”ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ LLM ์—์ด์ „ํŠธ์ด๋ฉฐ, ์ „๋ฌธํ™”๋œ ์ž‘์—…์—์„œ๋Š” ๋„๊ตฌ ์ฆ๊ฐ•์˜ ํšจ๊ณผ๊ฐ€ ์žˆ์ง€๋งŒ ์ผ๋ฐ˜ ํ™”ํ•™ ๋ฌธ์ œ์—์„œ๋Š” ๊ธฐ๋ณธ LLM์„ ๋Šฅ๊ฐ€ํ•˜์ง€ ๋ชปํ•จ์„ ๋ณด์—ฌ์ค€๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: The error statistics of CTA (GPT) on SMolInstruct (102 errors) and MMLU-Chemistry (64 errors).

How

Figure 1

Figure 1: Our ChemToolAgent framework. Upon receiv-

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ChemToolAgent๋Š” ๋„๊ตฌ ์ฆ๊ฐ• ์—์ด์ „ํŠธ์˜ ์žฅ๋‹จ์ ์„ ๋ช…ํ™•ํžˆ ๊ทœ๋ช…ํ•œ ์ค‘์š”ํ•œ ์‹ค์ฆ์  ์—ฐ๊ตฌ์ด๋ฉฐ, ๋„๊ตฌ๊ฐ€ ํ•ญ์ƒ ์„ฑ๋Šฅ์„ ๊ฐœ์„ ํ•˜์ง€ ์•Š๋Š”๋‹ค๋Š” ๋ฐ˜์ง๊ด€์  ๋ฐœ๊ฒฌ์€ ํ–ฅํ›„ ํ™”ํ•™ LLM ์—์ด์ „ํŠธ ์„ค๊ณ„์— ์ค‘์š”ํ•œ ํ•จ์˜๋ฅผ ์ œ๊ณตํ•œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ReAct๋Š” ํˆด ์‚ฌ์šฉ ๋Šฅ๋ ฅ์„ ์ฆ๊ฐ•์‹œํ‚ค๋Š” LLM ์„ค๊ณ„ ๋ฐฉ์‹์œผ๋กœ, ChemToolAgent์˜ ๋„๊ตฌ์ฆ๊ฐ• ์ ‘๊ทผ์— ํ•ต์‹ฌ์ ์ธ ์ด๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
115๋Š” LLM ๊ธฐ๋ฐ˜ ํ™”ํ•™ํˆด ์ฆ๊ฐ• ๊ธฐ๋ฒ• ์ „๋ฐ˜์„ ์ •๋ฆฌํ•˜์—ฌ, 214์˜ ChemToolAgent ์„ค๊ณ„์˜ ๊ฐœ๋…์  ๊ธฐ๋ฐ˜์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ChemToolAgent ๋…ผ๋ฌธ์€ ๋„๊ตฌ ํ™œ์šฉ ์ค‘์‹ฌ์˜ LLM ์—์ด์ „ํŠธ ์„ค๊ณ„์™€ ํ”ผ๋“œ๋ฐฑ-๊ธฐ๋ฐ˜ ์ƒ์„ฑ ํ‰๊ฐ€๋ฅผ ์‹คํ—˜ํ•˜์—ฌ Paper2Web์˜ ๋ฐ˜๋ณต ๊ฐœ์„ ํ˜• ์—์ด์ „ํŠธ ์„ค๊ณ„์— ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
214๋Š” ํ™”ํ•™ reasoning์—์„œ LLM์˜ ํˆด ํ™œ์šฉ์ด ๊ตฌ์กฐ์  reasoning ํ–ฅ์ƒ์— ์–ด๋–ป๊ฒŒ ๊ธฐ์—ฌํ•˜๋Š”์ง€ ๋‹ค๋ฃจ๋ฉฐ, 3172์˜ ๋…ผ์ฆ์  ํ”„๋ ˆ์ž„์›Œํฌ ์ด๋ก ์  ๊ธฐ๋ฐ˜์ด ๋ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
CLAM์€ LLM ์งˆ์˜ ๋ถ€์ •ํ™•์„ฑ ํ•ด๊ฒฐ์ด๋ผ๋Š” ๋‹ค๋ฅธ ๋„์ „์  ๋ฌธ์ œ์— ์ ‘๊ทผํ•˜์ง€๋งŒ, LLM์˜ tool-use๋‚˜ ์ƒํ˜ธ ์ž‘์šฉ ๊ด€์ ์—์„œ ๋‚ด์šฉ์ด ์ƒํ˜ธ๋ณด์™„์ ์ž…๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Chemist-X ์—ญ์‹œ ํ™”ํ•™ ๋ฌธ์ œ ํ•ด๊ฒฐ์— LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ๋ฅผ ํ™œ์šฉํ•˜์ง€๋งŒ, ๋„๊ตฌ ๋ฐ ์›Œํฌํ”Œ๋กœ์šฐ์˜ ์ ‘๊ทผ ๋ฐฉ์‹์—์„œ ์ฐจ๋ณ„์„ฑ์„ ๋ณด์ธ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
210 ๋˜ํ•œ LLM์— ํ™”ํ•™ ๋„๊ตฌ ์ ‘๋ชฉ ์—์ด์ „ํŠธ๋ฅผ ์ œ์•ˆํ•˜์ง€๋งŒ, ๋ฌธ์ œ ์ ‘๊ทผ ๋ฐ ๋„๊ตฌ์˜ ํ†ตํ•ฉ ๋ฐฉ์‹์ด 214์™€ ๊ตฌ๋ณ„๋œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
214 ๋…ผ๋ฌธ์€ 138๊ณผ ๋งˆ์ฐฌ๊ฐ€์ง€๋กœ LLM ๊ธฐ๋ฐ˜ ํ™”ํ•™ ํ•ฉ์„ฑ ๋ฐ ์‹คํ—˜ ์ž๋™ํ™”๋ฅผ ์ง€ํ–ฅํ•˜์ง€๋งŒ, ์‚ฌ์šฉ ๋„๊ตฌ ๋ฐ ๋ฒค์น˜๋งˆํฌ ํ™˜๊ฒฝ์ด ์ƒ์ดํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
214๋Š” ์—ฌ๋Ÿฌ ํ™”ํ•™ ๋„๊ตฌ๋ฅผ ํ†ตํ•ฉํ•ด์„œ ๋ฌธ์ œ๋ฅผ ํ‘ธ๋Š” ์—์ด์ „ํŠธ๋กœ 176์˜ ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ™•์žฅ๋œ ์ ์šฉ์‚ฌ๋ก€๋ผ๊ณ  ๋ณผ ์ˆ˜ ์žˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Modular large language model agents for multi-task computational chemistry ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ ํ™”ํ•™ ์ž‘์—…์—์„œ ๋ชจ๋“ˆํ˜• ๋„๊ตฌ ํ™•์žฅ์„ ํƒ๊ตฌํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
LLM Agent ๊ธฐ๋ฐ˜ ํ™”ํ•™ ๋ฐ˜์‘ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ๋ฐ ์ „ํ•˜ ์˜ˆ์ธก ํˆด(214)์ด ์‹ค์ œ๋กœ BOS-Lig dataset์˜ ํ™œ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ๋ณด์—ฌ์ค€๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •