MolHIT: Advancing Molecular-Graph Generation with Hierarchical Discrete Diffusion Models

์ €์ž: | ๋‚ ์งœ: 2026-02-19 | URL: https://arxiv.org/abs/2602.17602 📄 PDF


Essence

Figure 1

Figure 1. MolHIT achieves SOTA result on MOSES dataset.

MolHIT๋Š” Hierarchical Discrete Diffusion Model๊ณผ Decoupled Atom Encoding์„ ํ†ตํ•ด ๋ถ„์ž ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ์—์„œ ๊ฑฐ์˜ ์™„๋ฒฝํ•œ ํ™”ํ•™์  ์œ ํšจ์„ฑ(99.1%)์„ ๋‹ฌ์„ฑํ•œ SOTA ๋ชจ๋ธ๋กœ, 1D ์‹œํ€€์Šค ๋ชจ๋ธ์˜ ์œ ํšจ์„ฑ๊ณผ 2D ๊ทธ๋ž˜ํ”„ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ ํ˜์‹ ์„ฑ์„ ๋™์‹œ์— ์‹คํ˜„ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. MolHIT achieves SOTA result on MOSES dataset.

How

Figure 2

Figure 2. Overview of MolHIT. (a) Markov chain of Hierarchical Discrete Diffusion Model (HDDM). Clean states (S0) are tr

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: MolHIT๋Š” ๊ณ„์ธต์  ํ™•์‚ฐ ๋ชจ๋ธ๊ณผ ์›์ž ์ธ์ฝ”๋”ฉ์˜ ์žฌ์„ค๊ณ„๋ฅผ ํ†ตํ•ด ๋ถ„์ž ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ์˜ ์˜ค๋ž˜๋œ ๋ฌธ์ œ๋ฅผ ์šฐ์•„ํ•˜๊ฒŒ ํ•ด๊ฒฐํ•˜๋ฉฐ, ๊ฐ•๋ ฅํ•œ ์‹คํ—˜์  ์ฆ๊ฑฐ์™€ ์ด๋ก ์  ์ •๋‹น์„ฑ์œผ๋กœ ๋’ท๋ฐ›์นจ๋˜๋Š” ๊ฒฌ๊ณ ํ•œ ๊ธฐ์—ฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
MolGAN์€ ๋ถ„์ž๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ์˜ ๊ณ ์ „์ ์ธ generative ๋ชจ๋ธ๋กœ, MolHIT์˜ ๊ณ„์ธต์  diffusive modeling๊ณผ ๋น„๊ต ๊ฐ€๋Šฅํ•œ ์ด๋ก ์  ๊ธฐ๋ฐ˜์ด๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
MolHIT ๋…ผ๋ฌธ์€ ๋ถ„์ž ๊ทธ๋ž˜ํ”„ ์ƒ์„ฑ๊ณผ ๊ณ„์ธต์  ๊ตฌ์กฐ ๋ชจ๋ธ๋ง์„ ๋…ผ์˜ํ•˜์—ฌ, DISCO์—์„œ ํ™œ์šฉ๋  ์ˆ˜ ์žˆ๋Š” ์ƒ์„ฑํ˜• ๊ตฌ์กฐ-์„œ์—ด ์„ค๊ณ„ ๋ฐฉ๋ฒ•์˜ ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ReviewEval ๋…ผ๋ฌธ์€ AI ์ƒ์„ฑ ๋ถ„์ž๊ฐ€ ์˜์•ฝํ™”ํ•™ ํ‰๊ฐ€์—์„œ ์–ด๋–ค ํ˜์‹ ์„ฑ๊ณผ ์œ ํšจ์„ฑ์„ ๊ฐ–๋Š”์ง€ ๋ฆฌ๋ทฐ ๊ธฐ๋ฐ˜์œผ๋กœ ์ •๋Ÿ‰ ํ‰๊ฐ€ํ•˜์—ฌ, ์ƒ์„ฑ๋œ ๋ถ„์ž์˜ ํ‰๊ฐ€ ๊ด€์ ์„ ๋ณด์™„ํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Foundation models in bioinformatics ๋…ผ๋ฌธ์€ ๋ถ„์ž ํ‘œํ˜„ยท์ƒ์„ฑ์˜ foundation model ์ ‘๊ทผ์„ ๋‹ค๋ฃจ์–ด MolHIT์™€ ๋Œ€์กฐ์ ์œผ๋กœ ์ฝ์„ ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
3D ๋ถ„์ž ์ƒ์„ฑ ๋ฐ ํ‘œํ˜„ ํ•™์Šต์„ ์œ„ํ•œ ๋‹ค๋ฅธ ๋“ฑ๋ณ€ ์‹ ๊ฒฝ๋ง ์ ‘๊ทผ๋ฒ•์„ ์ทจํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค
๋‹ค๋ฅธ ์ ‘๊ทผ
๋ถ„์ž ๊ทธ๋ž˜ํ”„์˜ ๊ณ„์ธต๊ตฌ์กฐ ์ƒ์„ฑ์„ ์ƒˆ๋กœ์šด ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์ ‘๊ทผํ•˜๋Š” MolHIT ๋…ผ๋ฌธ๊ณผ, ์ด์‚ฐ/์—ฐ์† ์ƒ์„ฑ ์กฐํ•ฉ๊ธฐ๋ฒ•์˜ ์ƒํ˜ธ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ฒฐ์ • ์žฌ๋ฃŒ์˜ ๋ฌผ์„ฑ ์˜ˆ์ธก ๋ฐ ์—ญ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๋Œ€์•ˆ์  ์ƒ์„ฑ ๋ชจ๋ธ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ํ™”ํ•™ ๋ฐ˜์‘ ์˜ˆ์ธก๊ณผ ํ•ฉ์„ฑ ๊ฒฝ๋กœ ํƒ์ƒ‰์—์„œ diffusion ๋ฐ sequence-to-structure ๋„คํŠธ์›Œํฌ ์กฐํ•ฉ์„ ๋”์šฑ ๋ฐœ์ „์‹œํ‚ต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
MolHIT์˜ ๊ทธ๋ž˜ํ”„ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ์„ ๋„˜์–ด์„œ, attention ๊ธฐ๋ฐ˜ ์‹ ํ˜ธ ํ•ด์„ ๋ฐ motif ์ƒ์„ฑ์„ ํ†ตํ•ฉํ•œ generative protein sequence ์„ค๊ณ„ ๋ฐฉํ–ฅ์œผ๋กœ ํ™•์žฅํ•  ์ˆ˜ ์žˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Reward-Guided Discrete Diffusion ๋…ผ๋ฌธ์€ diffusion ๋ชจ๋ธ์—์„œ ๋ฆฌ์›Œ๋“œ๋ฅผ ํ™œ์šฉํ•ด ์ƒ์„ฑ์„ ์œ ๋„ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•ด MolHIT์˜ ํ™•์žฅ์„ ์ƒ์— ์žˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๊ณ„์ธตํ˜• ๋ถ„์ž๋™์—ญํ•™ ์ƒ์„ฑ ๋ฐ ํ† ํฐํ™” ๋ชจ๋ธ๋กœ ๊ธฐ์กด graph diffusion ๋ชจ๋ธ์˜ ๋‹ค์–‘ํ•œ ํ™•์žฅ ๋ฐ ์‘์šฉ ๊ฐ€๋Šฅ ์‚ฌ๋ก€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •