Reinforcement-guided generative protein language models enable de novo design of highly diverse AAV capsids

์ €์ž: | ๋‚ ์งœ: 2026-03-19 | URL: https://arxiv.org/abs/2603.19473 📄 PDF


Essence

Figure 1

Figure 1 | Overview of training and sequence generation strategies. The pre-trained, general-purpose ProGen model

Protein language model (PLM)์„ fine-tuningํ•˜๊ณ  reinforcement learning์œผ๋กœ ๊ฐ€์ด๋“œํ•˜์—ฌ ํ•™์Šต ๋ถ„ํฌ ๋ฐ–์˜ ๋‹ค์–‘ํ•œ AAV ์บก์‹œ๋“œ ์‹ ๊ทœ ์„œ์—ด์„ ์ƒ์„ฑํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ–ˆ๋‹ค.

Motivation

Achievement

Figure 4

Figure 4 | Sequence embedding-based analysis of novelty and viability. Unique sequences from each design

How

Figure 1

Figure 1 | Overview of training and sequence generation strategies. The pre-trained, general-purpose ProGen model

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: PLM๊ณผ reinforcement learning์„ ์ฐฝ์˜์ ์œผ๋กœ ๊ฒฐํ•ฉํ•˜์—ฌ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ๊ทผ๋ณธ์  ํ•œ๊ณ„์ธ ํ•™์Šต ๋ถ„ํฌ ํŽธํ–ฅ์„ ํ•ด๊ฒฐํ•˜๊ณ  AAV ์บก์‹œ๋“œ ์ƒ๋ฌผ๊ณตํ•™์— ์‹ค์งˆ์ ์œผ๋กœ ์ ์šฉ ๊ฐ€๋Šฅํ•œ ์ฒด๊ณ„์  ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ ์šฐ์ˆ˜ํ•œ ์—ฐ๊ตฌ์ด๋‚˜, ์‹คํ—˜์  ๊ฒ€์ฆ์ด ์„ ํ–‰๋˜์–ด์•ผ ์ž„์ƒ ์ ์šฉ ๊ฐ€๋Šฅ์„ฑ์„ ํŒ๋‹จํ•  ์ˆ˜ ์žˆ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋‹จ๋ฐฑ์งˆ/๋ถ„์ž ์„œ์—ด์—์„œ ๋ถˆ์•ˆ์ • ํŠน์ด์ (unstable singularities)์„ ํƒ๊ตฌํ•˜์—ฌ, PLM์˜ ์ƒ˜ํ”Œ ๊ณต๊ฐ„ ํ™•์žฅ์— ๊ฐœ๋…์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
dual expert ๊ตฌ์กฐ์™€ diffusion ์ƒ์„ฑ ๋ชจ๋ธ์„ ๋ช…์‹œ์ ์œผ๋กœ ๊ฒฐํ•ฉํ•จ์œผ๋กœ์จ 3228์˜ RL-๊ฐ€์ด๋“œ PLM ๊ธฐ๋ฐ˜ ์„œ์—ด ์ƒ์„ฑ์˜ ์ด๋ก ์  ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
De novo protein design์„ RFdiffusion ๊ธฐ๋ฐ˜์œผ๋กœ ์ ‘๊ทผํ•˜์—ฌ, RL ์•„๋‹Œ diffusion ๊ธฐ๋ฐ˜ ์ƒ์„ฑ์˜ ์„ฑ๊ณผ๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‹จ๋ฐฑ์งˆ ์„œ์—ด ์ƒ์„ฑ์„ ์œ„ํ•œ ์–ธ์–ด ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๋ฒ•์— ๋Œ€ํ•œ ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ƒ์„ฑํ˜• ๋‹จ๋ฐฑ์งˆ ์–ธ์–ด๋ชจ๋ธ ๋ฒค์น˜๋งˆํ‚น ๋…ผ๋ฌธ์œผ๋กœ, ์ƒ˜ํ”Œ๋ง ์ „๋žต๊ณผ ํƒ์ƒ‰ ๊ณต๊ฐ„์„ ํ™•์žฅํ•˜๋Š” ๋‹ค์–‘ํ•œ ์ ‘๊ทผ์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
3228 ๋…ผ๋ฌธ์€ ๊ฐ•ํ™”ํ•™์Šต์„ ๊ฒฐํ•ฉํ•œ ๋‹จ๋ฐฑ์งˆ/ํ•ต์‚ฐ ์–ธ์–ด ๋ชจ๋ธ ์ƒ์„ฑ๊ธฐ๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ๋ฐ˜๋ณต์  ์ตœ์ ํ™” ์ธก๋ฉด์—์„œ 3138 ๋…ผ๋ฌธ๊ณผ ๋Œ€์กฐ๋ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋ถ„์ž ๋ฐ ๋‹จ๋ฐฑ์งˆ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ์–ธ์–ด ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ํ”„๋ ˆ์ž„์›Œํฌ์— ๋Œ€ํ•œ ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๋‹จ๋ฐฑ์งˆ ์ฝ”๋ˆ ์ตœ์ ํ™” ๋ฐฉ๋ฒ•์œผ๋กœ, RL ๊ธฐ๋ฐ˜ ๋‹ค์–‘ํ•œ ์„œ์—ด ์ƒ์„ฑ ๋Œ€์•ˆ ๋ฐฉ์‹์— ๋Œ€ํ•œ ๋น„๊ต์ ์„ ์ œ๊ณตํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์ƒ๋ฌผ์ •๋ณดํ•™ ๊ธฐ๋ฐ˜์˜ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ ์„œ๋ฒ ์ด๋กœ, ๊ฐ•ํ™”ํ•™์Šต ๊ฐ€์ด๋“œ PLM์˜ ํ•ด์„/ํ™•์žฅ ์ ์šฉ ๋ฒ”์œ„๋ฅผ ๋งฅ๋ฝํ™”ํ•˜์—ฌ ์ดํ•ดํ•  ์ˆ˜ ์žˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
PLM์„ ๊ฐ•ํ™”ํ•™์Šต์œผ๋กœ ๊ฐ€์ด๋“œํ•˜์—ฌ ์ข… ํŠน์ด์  ๋‹จ๋ฐฑ์งˆ ์„œ์—ด์„ ์ƒ์„ฑํ•˜๋Š” ๋“ฑ 3223์ด ์ œ์•ˆํ•œ ๊ตฌ์กฐยท์ฝ”๋ˆ ์ตœ์ ํ™” ๋ฌธ์ œ๋ฅผ ํ™•์žฅํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๊ฐ•ํ™”ํ•™์Šต์„ ํ™œ์šฉํ•œ ์ƒ์„ฑํ˜• ๋‹จ๋ฐฑ์งˆ ์–ธ์–ด๋ชจ๋ธ์ด ๋งž์ถค ์„ค๊ณ„ ๊ฐœ๋…์„ ํ™•์žฅํ•˜์—ฌ ๋ฏธ๋ž˜ ์—ฐ๊ตฌ๋ฐฉํ–ฅ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ƒ์„ฑํ˜• ๋‹จ๋ฐฑ์งˆ LLM ๊ฐœ๋ฐœ๋กœ, ์ œ์–ด ๊ฐ€๋Šฅํ•œ ๋‹จ๋ฐฑ์งˆ ์„ค๊ณ„์— ๊ด€ํ•œ ํ›„์† ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
Virtual Biotech ๋…ผ๋ฌธ์€ ์‹ค์ œ ์‹ ์•ฝ ํƒ์ƒ‰์—์„œ ๋‹ค์ค‘ ์—์ด์ „ํŠธ ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ agentic protein design์˜ ์‹ค์ œ ์ ์šฉ ์˜ˆ์‹œ๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •