A Systematic Survey and Benchmark of Deep Learning for Molecular Property Prediction in the Foundation Model Era

์ €์ž: Zongru Li, Xingsheng Chen, Honggang Wen, Regina Qianru Zhang, Ming Li, Xiaojin Zhang, Hongzhi Yin, Qiang Yang, Kwok-Yan Lam, Pietro Lio, Siu-Ming Yiu | ๋‚ ์งœ: 2026-04-17 | URL: https://arxiv.org/abs/2604.16586 📄 PDF


Essence

Figure 1

Figure 1: Overview of this survey. The framework organizes more than 100 deep learning methods for molecular

๋ณธ ๋…ผ๋ฌธ์€ ๋ถ„์ž ์„ฑ์งˆ ์˜ˆ์ธก์— ๋Œ€ํ•œ ์ฒด๊ณ„์  ๋ฒค์น˜๋งˆํฌ ๋ฐ ์„ค๋ฌธ์กฐ์‚ฌ๋กœ, Quantum, Descriptor ML, Geometric DL, Foundation Model์˜ 4๊ฐ€์ง€ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ํ†ตํ•ฉ๋œ ๋ถ„๋ฅ˜์ฒด๊ณ„ ๋‚ด์—์„œ ๋น„๊ต ๋ถ„์„ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ๊ด€๋ จ ํ‘œ์ค€ํ™” ๋ฌธ์ œ์™€ ์žฌํ˜„์„ฑ ํ•œ๊ณ„๋ฅผ ์ง€์ ํ•˜๋ฉฐ physics-aware learning๊ณผ uncertainty-calibrated foundation models ๋“ฑ ํ–ฅํ›„ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Taxonomy of existing studies for molecular property prediction

ํ†ตํ•ฉ ๋ถ„๋ฅ˜์ฒด๊ณ„ ์ˆ˜๋ฆฝ: ๋ถ„์ž ํ‘œํ˜„, ๋ชจ๋ธ ์•„ํ‚คํ…์ฒ˜, ํ•™์ œ ๊ฐ„ ์‘์šฉ์„ ์—ฐ๊ฒฐํ•˜๋Š” ํ†ตํ•ฉ ๋ถ„๋ฅ˜์ฒด๊ณ„ ์ œ์‹œ. ๋ฒค์น˜๋งˆํฌ ๋ถ„์„: ์‚ฐ์—… ๊ด€์ ์„ ๋ฐ˜์˜ํ•œ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ด‘๋ฒ”์œ„ํ•œ ๋„๋ฉ”์ธ ์ปค๋ฒ„๋ฆฌ์ง€๋ฅผ ํฌํ•จํ•œ ๋ฒค์น˜๋งˆํฌ. ๋ฌธ์ œ์  ์ง€์ : ์ž…์ฒดํ™”ํ•™ ๋ถˆ์ผ์น˜, ์ด์ข… ์–ด์„ธ์ด ์ถœ์ฒ˜, ์žฌํ˜„์„ฑ ํ•œ๊ณ„ ๋“ฑ ํ˜„ํ™ฉ ๋ฒค์น˜๋งˆํฌ์˜ ๊ตฌ์ฒด์  ๋ฌธ์ œ์  ๋„์ถœ. ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ ์ œ์•ˆ: physics-aware learning, uncertainty-calibrated foundation models, multimodal benchmark ecosystems์˜ 3๊ฐ€์ง€ ์ „๋žต์  ๋ฐฉํ–ฅ ์ œ์‹œ.

How

Figure 3

Figure 3: The pipeline of deep learning-driven MPP

Originality

Limitation & Further Study

ํ˜„์กด ํ•œ๊ณ„: ๋ถˆํ™•์‹ค์„ฑ ์ •๋Ÿ‰ํ™” ๋ฐ ์‹ ๋ขฐ๋„ ํ‰๊ฐ€ ๋ฐฉ๋ฒ•์ด ์ถฉ๋ถ„ํžˆ ๋ฐœ๋‹ฌ๋˜์ง€ ์•Š์•˜์Œ; foundation models์˜ ๊ณ„์‚ฐ ๋น„์šฉ๊ณผ ๋ฉ”๋ชจ๋ฆฌ ์š”๊ตฌ์‚ฌํ•ญ ๋ฏธํก; ํ˜„์‹ค์˜ ์‹คํ—˜ ๋ฐ์ดํ„ฐ์™€ ๊ณ„์‚ฐ ์˜ˆ์ธก ๊ฐ„์˜ ๊ฒฉ์ฐจ ๋ฏธํ•ด๊ฒฐ. ํ›„์† ์—ฐ๊ตฌ: physics-aware GNN ์„ค๊ณ„์˜ ํ‘œ์ค€ํ™” ํ•„์š”; uncertainty quantification ๊ธฐ๋ฒ•์˜ ์‹ค์ฆ์  ๋น„๊ต ๋ถ€์กฑ; time-aware ๋ฐ scaffold-aware ๋ถ„ํ•  ์ „๋žต์˜ ์‚ฐ์—… ๋ฐ์ดํ„ฐ์…‹์— ๋Œ€ํ•œ ๊ฒ€์ฆ ์š”๊ตฌ; multimodal fusion ๋ฉ”์ปค๋‹ˆ์ฆ˜์˜ ํ•ด์„์„ฑ ๊ฐœ์„ .

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ ๋ถ„์ž ์„ฑ์งˆ ์˜ˆ์ธก ์˜์—ญ์— ๋Œ€ํ•œ ์ตœ์ดˆ์˜ ํฌ๊ด„์  ์„ค๋ฌธ์กฐ์‚ฌ๋กœ, 4๊ฐ€์ง€ ํŒจ๋Ÿฌ๋‹ค์ž„์„ ํ†ตํ•ฉ์ ์œผ๋กœ ๋ถ„์„ํ•˜๊ณ  ๋ฐ์ดํ„ฐ ๊ด€๋ฆฌ ๋ฐ ์žฌํ˜„์„ฑ ๋ฌธ์ œ๋ฅผ ๋ช…ํ™•ํžˆ ์ง€์ ํ•˜๋ฉฐ ๋ช…ํ™•ํ•œ ๋ฏธ๋ž˜ ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•œ๋‹ค. ํŠนํžˆ physics-aware learning๊ณผ uncertainty-calibrated foundation models ๋ฐฉํ–ฅ์€ ํ•™์ˆ  ๋ฐ ์‚ฐ์—… ์ˆ˜์ค€ ๋ชจ๋‘์—์„œ ์ฆ‰์‹œ ํ™œ์šฉ ๊ฐ€๋Šฅํ•œ ์ „๋žต์  ๊ฐ€์น˜๋ฅผ ๊ฐ€์ง„๋‹ค. ๋‹ค๋งŒ ์ œ์•ˆ๋œ ์„ธ ๋ฐฉํ–ฅ์— ๋Œ€ํ•œ ๊ตฌ์ฒด์  ๊ธฐ์ˆ  ๊ฒ€์ฆ์ด๋‚˜ ํ”„๋กœํ† ํƒ€์ž… ๊ตฌํ˜„์ด ๋ถ€์žฌํ•œ ์ ์€ ์ œํ•œ์‚ฌํ•ญ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋ถ„์ž ์ƒ์„ฑ ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ์‹์˜ ์ „๋ฐ˜์  ๋ฒค์น˜๋งˆํ‚น ๋ฐ ์‹ค์ œ ์„ฑ๋Šฅ ํ•œ๊ณ„ ๋ถ„์„์ด ๊ตฌ์กฐ ๊ธฐ๋ฐ˜ ํ† ํฐํ™” ์—ฐ๊ตฌ์˜ ์‹คํšจ์„ฑ ํ‰๊ฐ€์— ์ค‘์š”ํ•œ ๋ฐ”ํƒ•์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์งˆ๋Ÿ‰ ์ŠคํŽ™ํŠธ๋Ÿผ ์˜ˆ์ธก ๋„๊ตฌ๋“ค์˜ ์‹œ์Šคํ…œ์  ๋ฒค์น˜๋งˆํ‚น ๋ฆฌ๋ทฐ ๋…ผ๋ฌธ์œผ๋กœ FlexMS ํ”„๋ ˆ์ž„์›Œํฌ์˜ ํ•„์š”์„ฑ๊ณผ ๋ฐฉํ–ฅ์„ ์ •๋ฆฝํ•˜๋Š” ๋ฐ ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋ถ„์ž ์†์„ฑ ์˜ˆ์ธก ๋”ฅ๋Ÿฌ๋‹์˜ ์ฒด๊ณ„์  ๋ฒค์น˜๋งˆํฌ ์—ฐ๊ตฌ๋กœ, domain adaptation ๋ฐ ๋ฒ ์ด์Šค๋ผ์ธ ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ๋ชจ๋ธ๋ง์„ ๊ณต์ •ํ•˜๊ฒŒ ๋น„๊ตํ•  ๊ทผ๊ฑฐ ์ž๋ฃŒ๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
2997 ๋…ผ๋ฌธ์€ ๋ถ„์ž ์ˆ˜์ค€ ML ๋ชจ๋ธ์˜ ์„ฑ๋Šฅ ํ‰๊ฐ€์™€ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์„ฑ ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค๋ค„, 3035์˜ ์‹ค์ œ wet-lab ํ‰๊ฐ€์™€ ์—„๋ฐ€ํ•œ ๋ฒค์น˜๋งˆํฌ ๊ตฌ์ถ• ๋…ผ์˜์— ์ด๋ก ์  ์™ธ์—ฐ์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋ถ„์ž ๋„ํ‚น ๋ฐ ์žฌ๋žญํ‚น ๋ฐฉ๋ฒ• ๋ฒค์น˜๋งˆํ‚น์— ์ดˆ์ ์„ ๋งž์ถ˜ ๋…ผ๋ฌธ์œผ๋กœ, ๊ฐ™์€ ํ‘œ์ค€ ํ‰๊ฐ€ ์ ‘๊ทผ๋ฐฉ์‹์„ ๋…ผ์˜ํ•˜๋ฏ€๋กœ ํ•จ๊ป˜ ๋น„๊ตํ•ด๋ณด๋ฉด ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋ถ„์ž ์„ค๊ณ„, ์•ฝ๋ฌผ ๋ฐ ํŽฉํƒ€์ด๋“œ ์ƒ์„ฑ์— ์žˆ์–ด ๋ฐ์ดํ„ฐ ๋ถ„ํฌ ๋ฐ bias์˜ ์˜ํ–ฅ, ๋ฒค์น˜๋งˆํ‚น ๋ฐฉ๋ฒ•์— ๋Œ€ํ•œ ์ „๋ฐ˜์  ๋…ผ์˜๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
344 ๋…ผ๋ฌธ์€ bioinformatics ํŠนํ™” foundation model์˜ ์ „๋ฐ˜์  ๋ถ„๋ฅ˜ ๋ฐ ๋ฒค์น˜๋งˆํฌ ํ”„๋ ˆ์ž„์„ ์ œ์‹œํ•ด, 2997์˜ ๋ถ„์ž์„ฑ์งˆ ์˜ˆ์ธก ๋ฒค์น˜๋งˆํ‚น๊ณผ ๋งฅ๋ฝ์ ์œผ๋กœ ๋Œ€์กฐ๋ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์†Œ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹์—์„œ ์‹ฌ์ธต ์‹ ๊ฒฝ๋ง์˜ ์ผ๋ฐ˜ํ™” ๋ฐ ๋ฒค์น˜๋งˆํ‚น ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
3227๋ฒˆ ๋…ผ๋ฌธ์€ ํ”Œ๋ผ์ฆˆ๋งˆ ์ œ์–ด ๋“ฑ์—์„œ ๋‰ด๋Ÿด ๋„คํŠธ์›Œํฌ ๊ธฐ๋ฐ˜ ์‹ค์‹œ๊ฐ„ ์˜ˆ์ธก์„ ๋‹ค๋ฃจ๋ฉฐ, ๋ถ„์ž ๋ฌผ์„ฑ ์˜ˆ์ธก ๋ถ„์•ผ์˜ ML-๊ธฐ๋ฐ˜ ๋ฒ”์šฉ์„ฑ๊ณผ ํ•œ๊ณ„๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋ถ„์ž ํŠน์„ฑ ์˜ˆ์ธก์— ๊ด€ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๋ฒค์น˜๋งˆํฌ ๋…ผ๋ฌธ์œผ๋กœ, ์„œ๋กœ ๋‹ค๋ฅธ ๋ฐ์ดํ„ฐ์™€ ๋ฐฉ๋ฒ•๋ก ์„ ๋น„๊ตํ•˜๋ฉฐ ๋…๋ฆฝ์ ์ธ ๊ด€์ ์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
2997 ๋…ผ๋ฌธ์€ ๋ถ„์ž ์„ค๊ณ„ ๋ฐ ์˜ˆ์ธก์—์„œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์˜ ํ•œ๊ณ„์™€ ๊ฐ€๋Šฅ์„ฑ์„ ๋ฒค์น˜๋งˆํฌํ•˜์—ฌ, 3027์˜ DEL ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ์ผ๋ฐ˜ํ™” ํ‰๊ฐ€์™€ ๋Œ€์กฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
2997 ๋…ผ๋ฌธ์€ ๋ถ„์ž ๋„ํ‚น ๋ฐ ์•ฝ๋ฌผ๋ฐœ๊ตด ๋”ฅ๋Ÿฌ๋‹ ๋ฐฉ๋ฒ•์˜ ์„ฑ๋Šฅ ๋ฒค์น˜๋งˆํ‚น์„ ํญ๋„“๊ฒŒ ๋‹ค๋ค„, 3036์˜ ํ‰๊ฐ€ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ๋Œ€์กฐํ•ด๋ณผ ์ˆ˜ ์žˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ํ™”ํ•™ ๋ถ„์•ผ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ์— ๋Œ€ํ•œ ์‹œ๊ฐ์„ ์ œ์‹œํ•˜์—ฌ, ๋ถ„์ž ์„ฑ์งˆ ์˜ˆ์ธก ๋ฒค์น˜๋งˆํฌ ๋…ผ๋ฌธ์˜ ๋„์ž…๊ณผ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
346 ๋…ผ๋ฌธ์€ ๋ฒค์น˜๋งˆํฌ ๋ฐ ๋ฐ์ดํ„ฐ ํšจ์œจ์„ฑ, ์žฌํ˜„์„ฑ ํ•œ๊ณ„๋ฅผ physics-aware paradigms ๊ด€์ ์—์„œ ๋‹ค๋ค„, 2997์˜ ์‹คํ–‰์  ์‹œ์‚ฌ์ ์„ ์‹ฌํ™”์‹œํ‚ต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
2998๋ฒˆ ๋…ผ๋ฌธ์€ GNN ๊ธฐ๋ฐ˜ ์ „์ดํ•™์Šต ๋ถ„์ž ์˜ค๋น„ํƒˆ ์˜ˆ์ธก์„ ํ†ตํ•ด 2997๋ฒˆ ๋…ผ๋ฌธ์˜ ์ฒด๊ณ„์  ๋ถ„์ž ์˜ˆ์ธก ๋ฒค์น˜๋งˆํฌ์™€ ์„ฑ๋Šฅ ํ–ฅ์ƒ ์‚ฌ๋ก€๋ฅผ ๊ตฌ์ฒด์ ์œผ๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
2997๋ฒˆ ๋…ผ๋ฌธ์€ ๋‹ค์–‘ํ•œ DL ๋ถ„์ž ์˜ˆ์ธก ๋ชจ๋ธ์˜ ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ๊ณตํ•˜์—ฌ, GNN ๋ถ„์ž์˜ค๋น„ํƒˆ ์˜ˆ์ธก ์„ฑ๋Šฅ์˜ ๊ฐ๊ด€์  ๋น„๊ต์— ํ™œ์šฉ๋  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •