Reward-Guided Discrete Diffusion via Clean-Sample Markov Chain for Molecule and Biological Sequence Design

์ €์ž: | ๋‚ ์งœ: 2026-02-10 | URL: https://arxiv.org/abs/2602.09424 📄 PDF


Essence

Figure 1

Figure 1. Left: In scientific applications, the rewards defined on discrete spaces are highly sensitive to small perturb

์ด ๋…ผ๋ฌธ์€ ์ด์‚ฐ ํ™•์‚ฐ ๋ชจ๋ธ์˜ ํ…Œ์ŠคํŠธํƒ€์ž„ ๋ณด์ƒ ์œ ๋„ ์ƒ˜ํ”Œ๋ง์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•ด Clean-Sample Markov Chain (CSMC) Sampler๋ฅผ ์ œ์•ˆํ•œ๋‹ค. Metropolis-Hastings ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์‚ฌ์šฉํ•˜์—ฌ ๊นจ๋—ํ•œ ์ƒ˜ํ”Œ๋งŒ์œผ๋กœ ์ด๋ฃจ์–ด์ง„ ๋งˆ์ฝ”ํ”„ ์ฒด์ธ์„ ๊ตฌ์„ฑํ•˜๋ฏ€๋กœ, ์ค‘๊ฐ„ ๋ณด์ƒ ๊ณ„์‚ฐ์ด ํ•„์š” ์—†๊ณ  ๋ถ„์žยท์ƒ๋ฌผ ์„œ์—ด ์ƒ์„ฑ์—์„œ ๋” ๋‚˜์€ ์„ฑ๋Šฅ์„ ๋‹ฌ์„ฑํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. Left: In scientific applications, the rewards defined on discrete spaces are highly sensitive to small perturb

"

How

Figure 1

Figure 1. Left: In scientific applications, the rewards defined on discrete spaces are highly sensitive to small perturb

"

Originality

"

Limitation & Further Study

"

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: CSMC Sampler๋Š” ์ด์‚ฐ ํ™•์‚ฐ ๋ชจ๋ธ์˜ ๋ณด์ƒ ์œ ๋„ ์ƒ˜ํ”Œ๋ง ๋ฌธ์ œ๋ฅผ ์ฐฝ์˜์ ์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ๋ฐฉ๋ฒ•์ด๋‹ค. ๊นจ๋—ํ•œ ์ƒ˜ํ”Œ ๊ธฐ๋ฐ˜ ๋งˆ์ฝ”ํ”„ ์ฒด์ธ ๊ตฌ์„ฑ์œผ๋กœ ๊ณผํ•™ ๋„๋ฉ”์ธ์˜ ๋น„๋งค๋„๋Ÿฌ์šด ๋ณด์ƒ ํ•จ์ˆ˜ ๋ฌธ์ œ๋ฅผ ์šฐ์•„ํ•˜๊ฒŒ ๊ทน๋ณตํ•˜๋ฉฐ, ๋ถ„์žยท์ƒ๋ฌผ ์„œ์—ด ์ƒ์„ฑ์—์„œ ์ผ๊ด€๋œ ์„ฑ๋Šฅ ํ–ฅ์ƒ์„ ๋ณด์ธ๋‹ค. ๋‹ค๋งŒ ์ด๋ก ์  ์ˆ˜๋ ด์„ฑ ๋ถ„์„๊ณผ ๊ณ„์‚ฐ ๋ณต์žก๋„ ์ƒ์„ธ ๋ถ„์„์ด ๊ฐ•ํ™”๋˜๋ฉด ๋”์šฑ ์šฐ์ˆ˜ํ•  ๊ฒƒ์œผ๋กœ ํŒ๋‹จ๋œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋ณด์ƒ ์œ ๋„ ํ™•์‚ฐ๋ชจ๋ธ ๊ณ ๋„ํ™”(Iterative Distillation) ๋…ผ๋ฌธ์œผ๋กœ, Clean-Sample Markov chain ์ƒ˜ํ”Œ๋ง ์ „๋žต๊ณผ ๊ทผ๋ณธ์ ์ธ ์—ฐ๊ฒฐ์ ์„ ์„ค๋ช…ํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Reward-guided iterative refinement in diffusion models ๋…ผ๋ฌธ์€ Test-time ๋ณด์ƒ ๊ธฐ๋ฐ˜ ์ƒ˜ํ”Œ๋ง/์ตœ์ ํ™”์˜ ์ด๋ก ๊ณผ ์‹ค์ œ์  ํ•œ๊ณ„๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๋‹ค๋ฃฌ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ณผํ•™์  ๋…ผ๋ฌธ ์š”์•ฝ ๋ฐ ๊ตฌ์กฐ์  ์•„์ด๋””์–ด ์ œ์•ˆ์—์„œ reward-guided ์ƒ์„ฑ ๋ฐฉ์‹๊ณผ ๊ตฌ์กฐ์  ์„ค๊ณ„ ์ฐจ์ด๋ฅผ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Reward-Guided Discrete Diffusion์€ ๋ณด์ƒํ•จ์ˆ˜๋ฅผ ํ™œ์šฉํ•œ diffusion fine-tuning์ด๋ผ๋Š” ์œ ์‚ฌ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฅธ ์ˆ˜์‹์œผ๋กœ ํ•ด๊ฒฐํ•˜๋Š” ์ตœ์‹  ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Flow ๊ธฐ๋ฐ˜ ๋น„์ •์งˆ ์†Œ์žฌ ์ƒ์„ฑ ์—ฐ๊ตฌ๋กœ, ์ด์‚ฐ ํ™•์‚ฐ ๋ณด๋‹ค ๋‹ค๋ฅธ generative modeling ์ ‘๊ทผ์„ ๋‹ค๋ฃฌ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
DiffSyn ๋…ผ๋ฌธ์€ diffusion model์„ ํ™œ์šฉํ•œ ๋ฌผ์งˆ ํ•ฉ์„ฑ์„ ๋‹ค๋ฃจ๋ฉฐ, CSMC Sampler์™€ ๋น„๊ต๋˜๋Š” ์ƒˆ๋กœ์šด ์ƒ˜ํ”Œ๋ง ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Reward-Guided Discrete Diffusion์— ๋Œ€ํ•œ ๋˜๋‹ค๋ฅธ ์ ‘๊ทผ๋ฒ•์œผ๋กœ, ๋ถ„์ž ์„ค๊ณ„ ๋ฌธ์ œ์—์„œ ๋ณด์ƒ ๊ธฐ๋ฐ˜ ๋””ํ“จ์ „ ๋ชจ๋ธ๋ง ์„ฑ๋Šฅ ๋ฐ ์„ค๊ณ„ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Reward-guided discrete diffusion์˜ ์›๋ฆฌ๋ฅผ ํ™”ํ•™ ํ•ฉ์„ฑ ๋“ฑ ๋ถ„์ž ์ƒ์„ฑ์—์„œ ๋‹ค๋ฃจ์–ด, MP2D์˜ ๋ณด์ƒ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๊ณผ ์ง์ ‘ ๋น„๊ตยท์ฐธ์กฐ๊ฐ€ ๊ฐ€๋Šฅํ•˜๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๋ถ„์ž ์„ค๊ณ„ ๋“ฑ ์ƒ์„ฑ๋ชจ๋ธ์—์„œ ๋ณด์ƒ(๋ฆฌ์›Œ๋“œ) ๊ธฐ๋ฐ˜ refinement ๋ฐ ๊ฒ€์ƒ‰ ์ œ์–ด ๊ธฐ๋ฒ•์„ ์‹ค์ œ ๊ตฌํ˜„ ์‚ฌ๋ก€์™€ ์—ฐ๊ณ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Reward-Guided Discrete Diffusion ๋…ผ๋ฌธ์€ diffusion ๋ชจ๋ธ์—์„œ ๋ฆฌ์›Œ๋“œ๋ฅผ ํ™œ์šฉํ•ด ์ƒ์„ฑ์„ ์œ ๋„ํ•˜๋Š” ๊ธฐ๋ฒ•์„ ์ œ์•ˆํ•ด MolHIT์˜ ํ™•์žฅ์„ ์ƒ์— ์žˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
682์—์„œ์˜ ๋ถ„์ž์„ค๊ณ„ ํ™•์‚ฐ๋ชจ๋ธ ๋ณด์ƒ ์กฐ์ •๋ฐฉ์‹์ด 3233์˜ ๊ฐ•ํ™” ๋ณด์ƒ ๊ธฐ๋ฐ˜ ์ด์‚ฐ์  ๋ถ„์ž์ƒ์„ฑ ๋ฌธ์ œ๋กœ ์‹ค์ œ ์ ์šฉ๋œ ์‚ฌ๋ก€์™€ ์ž˜ ์—ฐ๊ฒฐ๋ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •