Harnessing Large Language Models to Collect and Analyze Metalโ€“Organic Framework Property Data Set

์ €์ž: Yeonghun Kang, Wonseok Lee, Taeun Bae, Seunghee Han, Huiwon Jang, Jihan Kim | ๋‚ ์งœ: 2025-02-05 | DOI: 10.1021/jacs.4c11085 📄 PDF


Essence

Figure 1

Figure 1. (a) Overall schematic of the L2M3 model. (b) Overall process of table mining. (c) Overall process of text mini

์ด ๋…ผ๋ฌธ์€ LLM์„ ํ™œ์šฉํ•˜์—ฌ 40,000์—ฌ ๊ฐœ์˜ MOF ๊ด€๋ จ ๋…ผ๋ฌธ์—์„œ ์ฒด๊ณ„์ ์œผ๋กœ ์‹คํ—˜ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜๊ณ  ๊ตฌ์กฐํ™”ํ•˜๋Š” L2M3 ์‹œ์Šคํ…œ์„ ๊ฐœ๋ฐœํ–ˆ์œผ๋ฉฐ, ์ถ”์ถœ๋œ ๋ฐ์ดํ„ฐ๋ฅผ ๋ฐ”ํƒ•์œผ๋กœ ํ•ฉ์„ฑ ์กฐ๊ฑด๊ณผ ๋ฌผ์„ฑ ๊ฐ„์˜ ๊ด€๊ณ„๋ฅผ ๋ถ„์„ํ•˜๊ณ  ํ•ฉ์„ฑ ์กฐ๊ฑด ์ถ”์ฒœ ์‹œ์Šคํ…œ์„ ๊ตฌ์ถ•ํ–ˆ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. (a) Overall schematic of the L2M3 model. (b) Overall process of table mining. (c) Overall process of text mini

How

Figure 1

Figure 1. (a) Overall schematic of the L2M3 model. (b) Overall process of table mining. (c) Overall process of text mini

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: LLM์„ ํ™œ์šฉํ•œ ๋Œ€๊ทœ๋ชจ ์ž๋™ํ™” ๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹ ์‹œ์Šคํ…œ์˜ ๊ฐœ๋ฐœ๊ณผ ์‹ค์ œ ์ ์šฉ์€ materials science์—์„œ ๋งค์šฐ ์‹ค์šฉ์ ์ด๊ณ  ํ˜์‹ ์ ์ด๋‹ค. 40,000๊ฐœ ๋…ผ๋ฌธ์—์„œ ๊ตฌ์กฐํ™”๋œ ๋ฐ์ดํ„ฐ๋ฅผ ์ถ”์ถœํ•˜์—ฌ ๊ณต๊ฐœ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ์ œ๊ณตํ•˜๊ณ , ํ•ฉ์„ฑ ์กฐ๊ฑด ์ถ”์ฒœ ์‹œ์Šคํ…œ๊นŒ์ง€ ๊ฐœ๋ฐœํ•œ ๊ฒƒ์€ MOF ๋ถ„์•ผ์— ์ฆ‰๊ฐ์ ์ธ ๊ธฐ์—ฌ๋ฅผ ํ•œ๋‹ค. ๋‹ค๋งŒ ์ถ”์ถœ ์ •ํ™•๋„ ํ–ฅ์ƒ๊ณผ gap ๋ถ„์„์˜ ์‹ฌํ™”๊ฐ€ ํ•„์š”ํ•˜๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์žฌ๋ฃŒ ๊ณผํ•™ ๋‚ด LLM์˜ ์—ญํ• , ํ•œ๊ณ„, ์ง€์‹ ๊ธฐ๋ฐ˜ ๊ฒฐํ•ฉ์˜ ํ•„์š”์„ฑ์„ ๋‹ค๋ค„, L2M3 ์‹œ์Šคํ…œ์˜ ๋ฐ์ดํ„ฐ ์ถ”์ถœ ํšจ๊ณผ์™€ ๊ฒฐํ•ฉ์‹œ ์‹œ๋„ˆ์ง€๋ฅผ ์„ค๋ช…ํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
MOF ๋…ผ๋ฌธ ๋ฐ์ดํ„ฐ๋งˆ์ด๋‹ยท์กฐ๊ฑด ์ถ”์ฒœ์„ ๋ฐ”ํƒ•์œผ๋กœ retrieval-augmented generation ๊ธฐ๋ฐ˜ ์žฌ๋ฃŒ ์„ค๊ณ„๋ฅผ ์‹ค์ œ ์ˆ˜ํ–‰ํ•œ ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
MOF ์‹คํ—˜ ๋ฐ์ดํ„ฐ ์ •๋ณด ์ถ”์ถœ๊ณผ ํ† ํฐํ™” ๋“ฑ ์ •๋ณด ์ถ”์ถœ ์ „๋žต์„ ๋‹ค๋ฅด๋ฉด์„œ๋„ ์œ ์‚ฌํ•œ ๋ชฉํ‘œ(3D ์ •๋ณด ๋ณด์กด)๋ฅผ ์ง€๋‹Œ ๋…ผ๋ฌธ์ด์–ด์„œ ์ƒํ˜ธ๋ณด์™„์ ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์žฌ๋ฃŒ๊ณผํ•™์—์„œ LLM ํ™œ์šฉ ์‚ฌ๋ก€ 34๊ฐ€์ง€๋ฅผ ์†Œ๊ฐœํ•˜์—ฌ MOF์™€ ๊ฐ™์€ ํŠน์ˆ˜ ๋ถ„์•ผ ๋ฐ์ดํ„ฐ ์ถ”์ถœ ์—ฐ๊ตฌ์˜ ๋‹ค์–‘ํ•œ ๋ฐฉํ–ฅ์„ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
SciCap ๋…ผ๋ฌธ์€ ๊ณผํ•™ ๊ทธ๋ฆผ์˜ ์บก์…˜ ์ƒ์„ฑ์„ ํ†ตํ•ด LLM ๊ธฐ๋ฐ˜ ์ •๋ณด ์ถ”์ถœ๊ณผ ๊ตฌ์กฐํ™”์˜ downstream ํ™œ์šฉ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ActionIE ๋…ผ๋ฌธ์€ ๊ณผํ•™ ๋…ผ๋ฌธ์—์„œ ์‹คํ—˜์  ํ–‰์œ„(action) ์ถ”์ถœ์— ์ง‘์ค‘ํ•˜์—ฌ, MOF ํ•ฉ์„ฑ์กฐ๊ฑด ์ •๋ณด ์ถ”์ถœ ์—ฐ๊ตฌ์™€์˜ ์‹œ๋„ˆ์ง€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Harnessing Large Language Models to Collect and Analyze Meta... ๋…ผ๋ฌธ์€ MOF ๋…ผ๋ฌธ ๋ถ„์„์˜ ๊ตฌ์ฒด์  ํ™•์žฅ ์—ฐ๊ตฌ๋กœ ์ง์ ‘์ ์œผ๋กœ ์—ฐ๊ณ„๋ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
MOF ์‹คํ—˜ ๋ฐ์ดํ„ฐ ์ถ”์ถœ ๋ฐ ํ™œ์šฉ์—์„œ ํ•œ๊ฑธ์Œ ๋” ๋‚˜์•„๊ฐ€ retrieval-augmented generation ๊ธฐ๋ฐ˜ ์žฌ๋ฃŒ ์„ค๊ณ„์— ์ ์šฉํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์ƒ๋ฌผํ•™์  ๋ถ„์ž์—์„œ proteome-wide ์ƒ์„ฑ ํƒœ์Šคํฌ๋กœ LLM ํ™œ์šฉ ์‚ฌ๋ก€๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ์ •๋ณด ์ถ”์ถœ ๋ฐ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ตฌ์ถ•์˜ ๋ฒ”์œ„๋ฅผ ํ™•์žฅํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
MOF ์‹คํ—˜ ๋ฐ์ดํ„ฐ์˜ ๊ตฌ์กฐํ™”์™€ ์ถ”์ถœ ๋“ฑ ์‹ค์ œ ์ •๋ณด์ฒ˜๋ฆฌ ์‘์šฉ์— ๊ธฐ์—ฌํ•  ์ˆ˜ ์žˆ๋‹ค.
๋ฐ˜๋ก /๋น„ํŒ
๊ณผํ•™ ์ •๋ณด ์ถ”์ถœ/๋ถ„์„์— LLM์˜ ์ง€์†์  ์ž๊ธฐ ๊ฐœ์„  ๊ฐ€๋Šฅ์„ฑ์— ๊ด€ํ•œ ํ•œ๊ณ„์™€ ๋ฐœ์ „ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•˜๋ฉฐ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ๋งˆ์ด๋‹์˜ ํ˜„์‹ค์  ๋ฌธ์ œ๋ฅผ ๋…ผ์˜ํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •