SIGMA: Structure-Invariant Generative Molecular Alignment for Chemical Language Models via Autoregressive Contrastive Learning

์ €์ž: | ๋‚ ์งœ: 2026-03-26 | URL: https://arxiv.org/abs/2603.25062 📄 PDF


Essence

Figure 1

Figure 1. Conceptual comparison between Global Contrastive Learning and our proposed Structure-Invariant Autoregressive

Chemical Language Models์˜ SMILES ์„ ํ˜•ํ™”๋กœ ์ธํ•œ ์–‘์‹ ๋ถˆ์ผ์น˜ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•˜๊ธฐ ์œ„ํ•ด ํ† ํฐ ๋‹จ์œ„ ๋Œ€์กฐ ํ•™์Šต(SIGMA)๊ณผ ๋™ํ˜• ๋น” ํƒ์ƒ‰(IsoBeam)์„ ์ œ์•ˆํ•˜์—ฌ ๊ตฌ์กฐ ๋ถˆ๋ณ€์„ฑ์„ ๊ฐ•์ œํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. Conceptual comparison between Global Contrastive Learning and our proposed Structure-Invariant Autoregressive

How

Figure 2

Figure 2. Token-level contrastive supervision across equivalent SMILES sequences. Two structurally equivalent SMILES (to

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ Chemical Language Models์˜ ๊ทผ๋ณธ์ ์ธ ์–‘์‹ ๋ถˆ์ผ์น˜ ๋ฌธ์ œ๋ฅผ ๋ช…์‹œ์ ์ธ token ์ˆ˜์ค€ ๋Œ€์กฐ ํ•™์Šต์œผ๋กœ ์šฐ์•„ํ•˜๊ฒŒ ํ•ด๊ฒฐํ•˜๋ฉฐ, IsoBeam์„ ํ†ตํ•œ ์‹ค์šฉ์  ์ถ”๋ก  ์•Œ๊ณ ๋ฆฌ์ฆ˜์œผ๋กœ ์™„์„ฑ๋„ ๋†’์€ ๊ธฐ์—ฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ๊ตฌ์กฐ ๋ถˆ๋ณ€์„ฑ ๋‹ฌ์„ฑ์˜ ์ด๋ก ์  ์—„๋ฐ€์„ฑ ๊ฐ•ํ™” ๋ฐ ๊ณ„์‚ฐ ๋ณต์žก๋„ ๋ถ„์„ ์ถ”๊ฐ€๊ฐ€ ํ•„์š”ํ•˜์ง€๋งŒ, ์‹œํ€€์Šค ๋ชจ๋ธ์˜ ํ™•์žฅ์„ฑ๊ณผ ๊ธฐํ•˜ํ•™์  ์—„๋ฐ€์„ฑ์˜ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„ ํ•ด๊ฒฐ์ด๋ผ๋Š” ์˜๋ฏธ ์žˆ๋Š” ์ง„์ „์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
SMILES ๊ธฐ๋ฐ˜ ํ™”ํ•™ ์–ธ์–ด ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์  ํ‘œํ˜„ ํ•™์Šต์— ๋Œ€ํ•œ ๋ฐฉ๋ฒ•๋ก ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋ถ„์ž ์ƒ์„ฑ ๋ฐ ํ‘œํ˜„์„ ์œ„ํ•œ ์–ธ์–ด ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์˜ ๊ธฐ๋ฐ˜์ด ๋˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋ถ„์ž ํ‘œํ˜„์˜ ๊ตฌ์กฐ์  ์ผ๊ด€์„ฑ์„ ๊ฐœ์„ ํ•˜๊ธฐ ์œ„ํ•œ ๋Œ€์•ˆ์  ๋Œ€์กฐ ํ•™์Šต ๋ฐฉ๋ฒ•์„ ์ œ์‹œํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ™”ํ•™ ์–ธ์–ด ๋ชจ๋ธ์˜ ๋ถ„์ž ํ‘œํ˜„ ์ •๋ ฌ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์œ ์‚ฌํ•œ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•˜๋Š” ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ๋ถ„์ž ์ถ”๋ก  ์–ธ์–ด๋ชจ๋ธ๋กœ, ๋ถ„์ž ์„ ํ˜•ํ™” ๋ถˆ์ผ์น˜ ๋ฐ ๋‹ค์ค‘ ํ‘œํ˜„ ๋ฌธ์ œ์— ๋Œ€ํ•œ ๋‹ค๋ฅธ ์ ‘๊ทผ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์ƒ๋ฌผํ•™์  ๋ฉ”์ปค๋‹ˆ์ฆ˜ ํ•ด์„ ๊ฐ€๋Šฅ์„ฑ๊ณผ ๋ถ„์ž ๊ตฌ์กฐ ์ƒ์„ฑยท์ •๋ ฌ์—์„œ ๋Œ€์กฐํ•™์Šต ๊ธฐ๋ฒ• ํ™œ์šฉ์„ ์ถ”๊ฐ€์ ์œผ๋กœ ํƒ๊ตฌํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
DNA ์‹œํ€€์Šค ์ƒ์„ฑ ๋ฐ ์ œ์–ด์  ํƒ์ƒ‰๋ฒ• ๋…ผ๋ฌธ์œผ๋กœ, SMILES ๋ถˆ์ผ์น˜ ๋ฌธ์ œ ๋“ฑ ์ƒ๋ช…ยทํ™”ํ•™ ์–ธ์–ด๋ชจ๋ธ์˜ ์‹ค์ „ ์‘์šฉ์„ ๋ณด์—ฌ์ค€๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •