Generative design of intrinsically disordered proteins based on conditioned protein language models: Data is the limit

์ €์ž: | ๋‚ ์งœ: 2026-04-14 | URL: https://www.biorxiv.org/content/10.64898/2026.04.14.718363v1 📄 PDF


Essence

Figure 1

Figure 1: A schematic overview of the architecture and conditioning strategy of the IDP-Prop2seq model.

์ด ๋…ผ๋ฌธ์€ ๋‹จ๋ฐฑ์งˆ ์–ธ์–ด๋ชจ๋ธ์„ ๊ธฐ๋ฐ˜ํ•œ Transformer encoder-decoder ์•„ํ‚คํ…์ฒ˜(IDR-Prop2Seq)๋ฅผ ์ œ์•ˆํ•˜์—ฌ target ํ˜•ํƒœ ์•™์ƒ๋ธ” ๊ธฐ์ˆ ์ž์— ์กฐ๊ฑด๋ถ€๋กœ ๋ณธ์งˆ์  ๋น„๊ตฌ์กฐ ์˜์—ญ(IDR) ์„œ์—ด์„ ์ƒ์„ฑํ•œ๋‹ค. ํ•ต์‹ฌ ๋ฐœ๊ฒฌ์€ ํ˜•ํƒœ ๋ฐ ๋ฌผ๋ฆฌํ™”ํ•™์  ํŠน์„ฑ์˜ ์ •๋ฐ€ํ•œ ์ œ์–ด๊ฐ€ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ ์Šค์ผ€์ผ(์•ฝ ์ฒœ๋งŒ ๊ฐœ bacterial IDR ์„œ์—ด)์—์„œ๋งŒ ๋‹ฌ์„ฑ๋˜๋ฉฐ, ๋ฐ์ดํ„ฐ ๊ฐ€์šฉ์„ฑ์ด IDR ์„ค๊ณ„ ๋ชจ๋ธ์˜ ์ฃผ์š” ํ•œ๊ณ„์ž„์„ ๋ณด์—ฌ์ค€๋‹ค.

Motivation

Achievement

Figure 2

Figure 2: Distribution of absolute error statistics for Rg and Ree. Violin plots showing the distributions

How

Figure 1

Figure 1: A schematic overview of the architecture and conditioning strategy of the IDP-Prop2seq model.

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 5/5 Overall: 4/5

์ดํ‰: ์ด ๋…ผ๋ฌธ์€ IDR ์„ค๊ณ„๋ผ๋Š” ๋ฏธ๊ฐœ๋ฐœ ๋ฌธ์ œ์— encoder-decoder ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ ๋ชจ๋ธ์„ ์ ์šฉํ•˜๊ณ , ๋ฐ์ดํ„ฐ ๊ทœ๋ชจ๊ฐ€ ์ƒ์„ฑ ๋ชจ๋ธ ์„ฑ๋Šฅ์— ๋ฏธ์น˜๋Š” ์˜ํ–ฅ์„ ์ฒด๊ณ„์ ์œผ๋กœ ์ž…์ฆํ•œ ์šฐ์ˆ˜ํ•œ ์—ฐ๊ตฌ๋‹ค. Transformer ์•„ํ‚คํ…์ฒ˜์™€ ๋Œ€๊ทœ๋ชจ computational dataset ํ™œ์šฉ์ด ๊ธฐ์ˆ ์ ์œผ๋กœ ๊ฒฌ๊ณ ํ•˜๊ณ , data-centric paradigm ์ œ์‹œ๊ฐ€ ์˜๋ฏธ ์žˆ๋‹ค. ๋‹ค๋งŒ ์ƒ์„ฑ๋œ ์‹œํ€€์Šค์˜ ์‹คํ—˜์  ๊ฒ€์ฆ๊ณผ ๊ธฐ๋Šฅ์„ฑ ํ™•์ธ์ด ๋ถ€์กฑํ•˜๋ฉฐ, ํ˜„์žฌ ๋ฐ์ดํ„ฐ ๊ทœ๋ชจ์˜ ์ถฉ๋ถ„์„ฑ ๋ฐ ์ถ”๊ฐ€ ์‹คํ—˜ ํ•„์š”์„ฑ ๋“ฑ์ด ํ•œ๊ณ„๋กœ ๋‚จ๋Š”๋‹ค. ํ–ฅํ›„ ์‹คํ—˜์  validation๊ณผ ๋” ํฐ ๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹ ํ™•๋ณด๊ฐ€ ํ•„์ˆ˜์ ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
DNA/๋‹จ๋ฐฑ์งˆ๊ณผ ๊ฐ™์€ ์ƒ์ฒด ๊ณ ๋ถ„์ž์˜ ์„œ์—ด ์ œ์–ด์  ์ƒ์„ฑ์— ๋Œ€ํ•œ ์ตœ๊ทผ ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋ถ„์ž์—์„œ ๊ฒŒ๋†ˆ ์Šค์ผ€์ผ๊นŒ์ง€์˜ ์„œ์—ด ๋ชจ๋ธ๋ง๊ณผ ๋””์ž์ธ์„ ๋‹ค๋ฃจ๋ฉฐ, IDR-Prop2Seq์ฒ˜๋Ÿผ sequence design์˜ ๋ฒ”์ฃผ ํ™•์žฅ ์‹ค์ œ ์‚ฌ๋ก€์™€ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํƒ€๊ฒŸ ํŽฉํƒ€์ด๋“œ์˜ ๊ตฌ์กฐ ๊ธฐ๋ฐ˜ ์ƒ์„ฑ ์„ค๊ณ„์— ์ดˆ์ ์„ ๋‘” ๋…ผ๋ฌธ์œผ๋กœ, ์ƒ์„ฑ๊ธฐ๋ฐ˜ ๋‹จ๋ฐฑ์งˆ ๋””์ž์ธ์˜ ๋‹ค๋ฅธ ๋ฐฉํ–ฅ์„ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
3285๋„ ๋ณธ์งˆ์  ๋น„๊ตฌ์กฐ ๋‹จ๋ฐฑ์งˆ(IDR) ์˜์—ญ์˜ ๋‹ค์–‘ํ•œ ๊ตฌ์กฐ ensemble ์ƒ์„ฑ์„ ์ฃผ์ œ๋กœ ํ•˜๋ฏ€๋กœ ๋‘ ์ ‘๊ทผ์˜ ๋ฐฉ๋ฒ•๋ก  ๋ฐ ๋ฐ์ดํ„ฐ ์š”๊ตฌ์„ฑ ๋น„๊ต์— ์œ ์šฉํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋‹จ๋ฐฑ์งˆ์˜ ๊ตฌ์กฐ์™€ ์—ญ๋™์„ฑ์„ ์ƒ์„ฑ์  AI๋กœ ์„ค๊ณ„ํ•˜๋Š” ์—ฐ๊ตฌ๋กœ, ๋‹จ๋ฐฑ์งˆ-RNA ์‘์ถ• ๋ฉ”์ปค๋‹ˆ์ฆ˜๊ณผ ๋ถ„์ž ๋™์—ญํ•™์„ ๋‹ค๋ฅธ ์ ‘๊ทผ์—์„œ ํƒ๊ตฌํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
3253์˜ ๋น„์ง€๋„ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด ๋‹ค์–‘์„ฑ ๋ชจ๋ธ์€ 3116์˜ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ ๋ฐ ํŠน์„ฑ ์ œ์–ด ๊ณผ์ œ์™€ ๋ฐ€์ ‘ํ•˜๊ฒŒ ์—ฐ๊ด€๋ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๊ฐ•ํ™”ํ•™์Šต ๊ธฐ๋ฐ˜ ์ƒ์„ฑํ˜• ๋‹จ๋ฐฑ์งˆ LLM ๊ฐœ๋ฐœ๋กœ, ์ œ์–ด ๊ฐ€๋Šฅํ•œ ๋‹จ๋ฐฑ์งˆ ์„ค๊ณ„์— ๊ด€ํ•œ ํ›„์† ์—ฐ๊ตฌ์ž…๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
๋ฌด์งˆ์„œ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด ์„ค๊ณ„์— ์ƒ์„ฑ์  AI๋ฅผ ์ ์šฉํ•˜๋ฉฐ ์ƒ๋ถ„๋ฆฌ ๋ชจํ‹ฐํ”„ ์˜ˆ์ธก์˜ ์‹ค์ œ์  ์‘์šฉ ์˜ˆ์‹œ๋ฅผ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •