MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses

์ €์ž: Zonglin Yang, Wanhao Liu, Ben Gao, Tong Xie, Yuqiang Li | ๋‚ ์งœ: 2024 | DOI: 10.48550/arXiv.2410.07076 📄 PDF


Essence

Figure 1

Figure 1: The MOOSE-Chem framework. It receives b and I as input, and outputs a list of ranked

๋ณธ ๋…ผ๋ฌธ์€ LLM์ด ํ™”ํ•™ ๋ถ„์•ผ์—์„œ ์ž๋™์œผ๋กœ ์ƒˆ๋กœ์šด ๊ฐ€์„ค์„ ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ๋Š”์ง€๋ฅผ ์กฐ์‚ฌํ•œ๋‹ค. ์ €์ž๋“ค์€ ๊ฐ€์„ค ๋ฐœ๊ฒฌ์„ ๋ฐฐ๊ฒฝ ์ง€์‹๊ณผ ์˜๊ฐ ๊ฐœ๋…์œผ๋กœ ๋ถ„ํ•ดํ•˜๋Š” ์ˆ˜ํ•™์  ์ ‘๊ทผ์„ ์ œ์•ˆํ•˜๊ณ , ์ด๋ฅผ ๊ตฌํ˜„ํ•œ MOOSE-Chem ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ Nature/Science ์ˆ˜์ค€์˜ 51๊ฐœ ํ™”ํ•™ ๋…ผ๋ฌธ์—์„œ ๊ฐ€์„ค์„ ์žฌ๋ฐœ๊ฒฌํ•  ์ˆ˜ ์žˆ์Œ์„ ๋ณด์ธ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Overview of the input and output of the MOOSE-Chem framework.

์ฃผ์š” ์„ฑ๊ณผ:

โ€ข MOOSE-Chem์ด Nature/Science ์ˆ˜์ค€์˜ 51๊ฐœ ํ™”ํ•™ ๋…ผ๋ฌธ์—์„œ ํ•ต์‹ฌ ํ˜์‹ ์„ ํฌํ•จํ•œ ๋†’์€ ์œ ์‚ฌ๋„์˜ ๊ฐ€์„ค ์žฌ๋ฐœ๊ฒฌ ์„ฑ๊ณต

โ€ข 2024๋…„ ์ดํ›„ ๋ฐœํ‘œ๋œ ๋…ผ๋ฌธ์„ ์‚ฌ์šฉํ•˜์—ฌ ๋ฐ์ดํ„ฐ ์˜ค์—ผ ๋ฐฐ์ œ ๋ณด์žฅ

โ€ข LLM์˜ inspiration retrieval ์ž‘์—…์—์„œ ๋†€๋ž๊ฒŒ ๋†’์€ ์ •ํ™•๋„ ๋‹ฌ์„ฑ

โ€ข ์ตœ์ดˆ๋กœ ๊ณผํ•™ ๋ฐœ๊ฒฌ ์ž‘์—…์—์„œ ranking ๋ฌธ์ œ๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ์ œ์•ˆํ•˜๊ณ  ํ‰๊ฐ€ ๊ธฐ์ค€ ๊ฐœ๋ฐœ

How

Figure 1

Figure 1: The MOOSE-Chem framework. It receives b and I as input, and outputs a list of ranked

โ€ข ์ˆ˜ํ•™์  ๋ถ„ํ•ด: ๋ฐฐ๊ฒฝ๊ณผ ์˜๊ฐ์œผ๋กœ๋ถ€ํ„ฐ ๊ฐ€์„ค ์ƒ์„ฑ์˜ ๊ตฌ์กฐํ™”๋œ ์ ‘๊ทผ

โ€ข Multi-agent framework: ๋…๋ฆฝ์ ์ธ ์—์ด์ „ํŠธ๋“ค์˜ ํ˜‘๋ ฅ์„ ํ†ตํ•œ ๊ฐ€์„ค ์ƒ์„ฑ

โ€ข Evolutionary algorithm: ๋ฐฐ๊ฒฝ๊ณผ ์˜๊ฐ์˜ ์—ฐ๊ด€์„ฑ์„ ๊ฐ•ํ™”ํ•˜๊ธฐ ์œ„ํ•œ ๋ณ€์ด ์ „๋žต

โ€ข Multi-step retrieval: ๋‹จ์ผ์ด ์•„๋‹Œ ๋‹ค์ค‘ ์˜๊ฐ ๊ฒ€์ƒ‰์„ ํ†ตํ•œ ๋‹ค์–‘์„ฑ ํ™•๋ณด

โ€ข TOMATO-Chem benchmark: 51๊ฐœ ๋…ผ๋ฌธ์˜ ๋ฐฐ๊ฒฝ-์˜๊ฐ-๊ฐ€์„ค ์‚ผ์ค‘ annotation

Originality

โ€ข ํ™”ํ•™ ๋ถ„์•ผ์— ๋Œ€ํ•œ LLM ๊ธฐ๋ฐ˜ ๊ณผํ•™ ๋ฐœ๊ฒฌ์˜ ์ตœ์ดˆ ์ฒด๊ณ„์  ์—ฐ๊ตฌ

โ€ข background-inspiration ๋ถ„ํ•ด ๊ฐœ๋…์˜ ์ˆ˜ํ•™์  ํ˜•์‹ํ™” ๋ฐ ๊ณต์‹ํ™”

โ€ข Scientific discovery์—์„œ ranking ์ž‘์—…์„ ๋ช…์‹œ์ ์œผ๋กœ ๋‹ค๋ฃจ๋Š” ์ฒซ ์‹œ๋„

โ€ข evolutionary algorithm์„ ํ™œ์šฉํ•œ LLM ๊ธฐ๋ฐ˜ ๊ฐ€์„ค ์ƒ์„ฑ ๋ฐฉ๋ฒ•์˜ ํ˜์‹ ์  ์ ์šฉ

Limitation & Further Study

โ€ข ๋ฒค์น˜๋งˆํฌ ๊ทœ๋ชจ: 51๊ฐœ ๋…ผ๋ฌธ์œผ๋กœ๋Š” ํ†ต๊ณ„์  ๊ฐ•๊ฑด์„ฑ์ด ์ œํ•œ์ ์ด๋ฉฐ ๋” ํฐ ๊ทœ๋ชจ์˜ ํ‰๊ฐ€ ํ•„์š”

โ€ข ํ‰๊ฐ€ ๋ฐฉ๋ฒ•๋ก : ์ž๋™ ์œ ์‚ฌ๋„ ๋ฉ”ํŠธ๋ฆญ์˜ ํ•œ๊ณ„๋กœ ์ธํ•ด ์ „๋ฌธ๊ฐ€ ํ‰๊ฐ€์— ์˜์กดํ•˜๋ฏ€๋กœ ํ‰๊ฐ€์˜ ์ฃผ๊ด€์„ฑ ์กด์žฌ

โ€ข ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ: ํ™”ํ•™ ํŠน์ • ํŠน์„ฑ์— ๋Œ€ํ•œ ์˜์กด์„ฑ์œผ๋กœ ์ธํ•ด ๋‹ค๋ฅธ ๊ณผํ•™ ๋ถ„์•ผ๋กœ์˜ ์ง์ ‘ ํ™•์žฅ ์–ด๋ ค์›€

โ€ข LLM ์ง€์‹: ๊ฐ€์„ค ๋ฐœ๊ฒฌ์˜ ์„ฑ๊ณต์ด ํ•™์Šต ๋ฐ์ดํ„ฐ์— ํฌํ•จ๋œ ์œ ์‚ฌ ์—ฐ๊ตฌ์— ๋Œ€ํ•œ ์•”๋ฌต์  ์ง€์‹์— ์˜์กดํ•  ๊ฐ€๋Šฅ์„ฑ

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ํ™”ํ•™ ๋ถ„์•ผ์—์„œ LLM์˜ ๊ณผํ•™ ๋ฐœ๊ฒฌ ๋Šฅ๋ ฅ์„ ์ฒด๊ณ„์ ์œผ๋กœ ๊ฒ€์ฆํ•œ ์šฐ์ˆ˜ํ•œ ์—ฐ๊ตฌ๋‹ค. ์ˆ˜ํ•™์  ๋ถ„ํ•ด์™€ engineering ๊ธฐ๋ฒ•์˜ ๊ฒฐํ•ฉ, ๊ณ ํ’ˆ์งˆ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ•, ๊ทธ๋ฆฌ๊ณ  ์‹ค์ œ Nature/Science ๋…ผ๋ฌธ์—์„œ์˜ ๊ฐ€์„ค ์žฌ๋ฐœ๊ฒฌ ์„ฑ๊ณต์ด ์ฃผ์š” ๊ฐ•์ ์ด๋‹ค. ๋‹ค๋งŒ ๋ฒค์น˜๋งˆํฌ ๊ทœ๋ชจ์™€ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•๋ก  ์ƒ์˜ ํ•œ๊ณ„๊ฐ€ ์žˆ์œผ๋ฉฐ, LLM์˜ ์ธ์ฝ”๋”ฉ๋œ ์ง€์‹๊ณผ ์‹ค์ œ ์ƒˆ๋กœ์šด ๋ฐœ๊ฒฌ ๋Šฅ๋ ฅ์˜ ๋ถ„๋ฆฌ์— ๋Œ€ํ•œ ๋” ๊นŠ์€ ๋ถ„์„์ด ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๊ณผํ•™ ๋…ผ๋ฌธ์„ ํ•™์Šตํ•œ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๊ณผํ•™ ์ถ”๋ก  ๋ฐฉ๋ฒ•์˜ ์ด๋ก ์  ๊ธฐ๋ฐ˜ ๋ฐ ๋‹ค์–‘ํ•œ ๋„๋ฉ”์ธ ์ ์šฉ์ ์„ ์ดํ•ดํ•˜๋Š” ๋ฐ ์œ ์šฉํ•˜๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
419๋Š” LLM ๊ธฐ๋ฐ˜ ๊ณผํ•™์  ๊ฐ€์„ค ์ƒ์„ฑ์— ์ดˆ์ ์„ ๋งž์ถ˜ ๋…ผ๋ฌธ์œผ๋กœ, MOOSE-Chem์˜ unseen hypothesis discoverability ์ฃผ์ œ์˜ ๊ธฐ๋ฐ˜์ด ๋œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Large Language Models are Zero Shot Hypothesis Proposers ๋…ผ๋ฌธ์€ LLM์˜ ์ œ๋กœ์ƒท ๊ณผํ•™ ๊ฐ€์„ค ์ƒ์„ฑ ๊ฐ€๋Šฅ์„ฑ์„ ํƒ๊ตฌํ•˜๋ฉฐ, MOOSE-Chem์˜ ์ฐฝ๋ฐœ์  ์ ์šฉ๊ณผ ๋น„๊ต๋œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Efficient Evolutionary Search Over Chemical Space with Large Language Model Agents ์—ญ์‹œ ํ™”ํ•™์—์„œ LLM ๊ธฐ๋ฐ˜์œผ๋กœ ๋ฏธ์ง€์˜ ๊ฐ€์„ค์„ ํƒ์ƒ‰ํ•˜๋Š”๋ฐ ์ดˆ์ ์„ ๋‘ก๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ChemDFM์€ ํ™”ํ•™๋ถ„์•ผ LLM ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ ๊ฐœ๋ฐœ๋กœ ๋ฏธ๋ฐœ๊ฒฌ ๊ณผํ•™ ๊ฐ€์„ค ์žฌ๋ฐœ๊ฒฌ ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋Œ€์•ˆ์  ์ ‘๊ทผ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์ƒ์˜ํ•™ ๋„๋ฉ”์ธ์—์„œ RAG ๋ฐ Knowledge Graph ๊ธฐ๋ฐ˜ LLM ๊ฐ€์„ค ์ƒ์„ฑ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•ด, ๋ฏธ๋ฐœ๊ฒฌ ๊ฐ€์„ค ์ž๋™ํ™” ์ธก๋ฉด์—์„œ ํ™”ํ•™๋ถ„์•ผ์™€ ๋ฐฉ๋ฒ•์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๋ฌธํ—Œ ๋ฐ์ดํ„ฐ ์—ฐ๊ณ„ ๊ธฐ๋ฐ˜์˜ ๊ณผํ•™์  ๊ฐ€์„ค ๋„์ถœ ๋ฐฉ๋ฒ•์„ LLM์— ์ ์šฉํ•œ ์ ์—์„œ ๋ฏธ๋ฐœ๊ฒฌ ๊ณผํ•™ ๊ฐ€์„ค ํƒ์ƒ‰ ๋ฐฉ๋ฒ•๋ก ์„ ํ™•์žฅํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
820์€ ๊ณผํ•™์  ๊ฐ€์„ค ์ƒ์„ฑ์˜ ์‹ ๋ขฐ์„ฑ๊ณผ ์‹ค์ œ์  ๊ฒ€์ฆ ๊ธฐ์ค€์„ ํ™•๋ฆฝํ•˜์—ฌ, MOOSE-Chem์˜ ์ž๋™ ๊ฐ€์„ค ์ถ”์ฒœ ํ•œ๊ณ„ ๋ฐ ๊ฐœ์„ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
MOOSE-Chem์€ ํ™”ํ•™ ๋„๋ฉ”์ธ์—์„œ ์•„์ด๋””์–ด ์žฌ์กฐํ•ฉ ๋ฐ ํ˜์‹  ๊ฐ€์„ค ๋ฐœ๊ฒฌ์„ LLM์œผ๋กœ ์ˆ˜ํ–‰ํ•ด Chimera์˜ ์ฐฝ์˜์  ์‚ฌ๊ณ  ๋ถ„์„์„ ์‹ค์ œ ๊ณผํ•™์— ์ ์šฉํ•œ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
MOOSE-Chem์€ LLM์„ ํ™œ์šฉํ•œ ์ƒˆ๋กœ์šด ํ™”ํ•ฉ๋ฌผ ์žฌ๋ฐœ๊ฒฌ ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๋ฉฐ, ์•ฝ๋ฌผ๋ฐœ๊ฒฌ ์ž‘์—…์—์„œ DrugPlayGround์˜ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ํ˜„์žฅ ์‹คํ—˜ ๊ฒ€์ฆ๊นŒ์ง€ ํ™•์žฅํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
Scientific discovery in the age of artificial intelligence ๋…ผ๋ฌธ์€ Moose-Chem ์—ฐ๊ตฌ์ฒ˜๋Ÿผ LLM์„ ๊ณผํ•™์  ๋ฐœ๊ฒฌ(ํŠนํžˆ ํ™”ํ•™ ๋ฐ ์ƒ๋ฌผํ•™) ๋งฅ๋ฝ์—์„œ ์‹ค์ œ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ์กฐ๋งํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •