Unsupervised protein language models learn patterns of enzyme function

์ €์ž: | ๋‚ ์งœ: 2026-04-23 | URL: https://www.biorxiv.org/content/10.64898/2026.04.23.720319v1 📄 PDF


Essence

Figure 1

Figure 1: Mining embedding space using PLM-clust: providing access to novel enzymes with minimal

๋‹จ๋ฐฑ์งˆ ์–ธ์–ด๋ชจ๋ธ ESM2์˜ ํ‰๊ท  ํ’€๋ง ์ž„๋ฒ ๋”ฉ(MPE)๊ณผ k-means ํด๋Ÿฌ์Šคํ„ฐ๋ง์„ ๊ฒฐํ•ฉํ•œ PLM-clust ์•Œ๊ณ ๋ฆฌ์ฆ˜์„ ์ œ์•ˆํ•˜์—ฌ, ์ดˆ๊ธฐ ์•Œ๋ ค์ง„ ํšจ์†Œ๋กœ๋ถ€ํ„ฐ ์‹œ์ž‘ํ•ด ๋ฐ˜๋ณต์  ์‹คํ—˜์„ ํ†ตํ•ด ๋ชฉํ‘œ ๊ธฐ๋Šฅ์„ ๊ฐ€์ง„ ์‹ ๊ทœ ํšจ์†Œ๋ฅผ ๋ฐœ๊ฒฌํ•˜๋Š” ๋ฌด๊ฐ๋… ํ•™์Šต ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ์ž์ผ๋ผ๋‚˜์ œ์™€ ์ด๋ฏผํ™˜์›ํšจ์†Œ์—์„œ 100๋ฐฐ ์ด์ƒ์˜ ํ™œ์„ฑ ๋˜๋Š” ์ด‰๋งค ๋‹ค์–‘์„ฑ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1: Mining embedding space using PLM-clust: providing access to novel enzymes with minimal

์ž์ผ๋ผ๋‚˜์ œ ๋ฐœ๊ฒฌ: ๊ธ€๋ฆฌ์ฝ”์‹ค ํ•˜์ด๋“œ๋กค๋ผ์ œ์—์„œ 100๋ฐฐ ์ด์ƒ์˜ ํ™œ์„ฑ ์ฆ๊ฐ€ ๋‹ฌ์„ฑ, ์ด๋ฏผํ™˜์›ํšจ์†Œ(IRED) ์—ฐ๊ตฌ: ์ด‰๋งค ๋‹ค์–‘์„ฑ ํ”„๋กœํŒŒ์ผ์—์„œ 100๋ฐฐ ์ด์ƒ์˜ ์ฆ๊ฐ€ ๋‹ฌ์„ฑ, ์˜ˆ์ธก ์ •ํ™•๋„: ์•ฝ 10๊ฐœ ํšจ์†Œ ์Šคํฌ๋ฆฐ์œผ๋กœ ~90% ์‹ ๋ขฐ๋„์˜ ์˜ˆ์ธก ์ƒ์„ฑ ๊ฐ€๋Šฅ, ๊ณต๊ฐ„ ํƒ์ƒ‰: ๋Œ€๋ถ€๋ถ„์˜ ์ž”๊ธฐ๊ฐ€ ๊ตํ™˜๋œ ๊นŠ์€ ์„œ์—ด ๊ณต๊ฐ„ ์˜์—ญ๊นŒ์ง€ ๋„๋‹ฌ, ๋‹จ๊ณ„๋ณ„ ํšจ์œจ์„ฑ: ๊ฐ ๋ฐ˜๋ณต ์‚ฌ์ดํด์ด ํด๋Ÿฌ์Šคํ„ฐ ์ˆ˜๋งŒํผ์˜ ์‹คํ—˜๋งŒ ํ•„์š”

How

Figure 1

Figure 1: Mining embedding space using PLM-clust: providing access to novel enzymes with minimal

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 5/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: PLM-clust๋Š” ๋‹จ๋ฐฑ์งˆ ์–ธ์–ด๋ชจ๋ธ์˜ ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์„ ํšจ์†Œ ๋ฐœ๊ฒฌ์— ์ฒด๊ณ„์ ์œผ๋กœ ํ™œ์šฉํ•˜๋Š” ์ฐฝ์˜์ ์ด๊ณ  ์‹ค์šฉ์ ์ธ ๋ฐฉ๋ฒ•๋ก ์„ ์ œ์‹œํ•œ๋‹ค. ๋ฐ˜๋ณต์  ์‹คํ—˜ ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ์ œํ•œ๋œ ์‹คํ—˜ ์˜ˆ์‚ฐ ๋‚ด์—์„œ 100๋ฐฐ ์ด์ƒ์˜ ํ™œ์„ฑ ํ–ฅ์ƒ์„ ๋‹ฌ์„ฑํ–ˆ์œผ๋ฉฐ, ๊ธฐ์กด directed evolution์˜ ํ•œ๊ณ„๋ฅผ ์šฐํšŒํ•˜๋Š” ๊ฐœ๋…์  ์ง„์ „์„ ๋ณด์—ฌ์ค€๋‹ค. ๋‹ค๋งŒ ์ดˆ๊ธฐ ์กฐ๊ฑด ์˜์กด์„ฑ, zero-shot ์Šค์ฝ”์–ด๋ง ์ด๋ก  ๋ถ€์กฑ, ์ œํ•œ๋œ ๊ฒ€์ฆ ๋ฒ”์œ„ ๋“ฑ์˜ ๊ฐœ์„  ํ•„์š” ์š”์†Œ๊ฐ€ ์žˆ์œผ๋‚˜, ์ƒ๋ฌผ์ด‰๋งค ์—”์ง€๋‹ˆ์–ด๋ง ๋ถ„์•ผ์—์„œ ์ƒ๋‹นํ•œ ์‹ค์ œ ๊ฐ€์น˜๋ฅผ ์ง€๋‹Œ ๊ธฐ์—ฌ๋กœ ํ‰๊ฐ€๋œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋Œ€๊ทœ๋ชจ ์ง„ํ™” ์Šค์ผ€์ผ ๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ ์˜ˆ์ธก ์–ธ์–ด๋ชจ๋ธ์ด ํšจ์†Œ ํ™œ์„ฑ ํŒจํ„ด ๋ฐœ๊ตด์˜ ๊ธฐ์ดˆ๊ฐ€ ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
527์€ AI ์•ˆ์ „์„ฑ์˜ ๊ด€์ ์—์„œ ๋ฉ”์ปค๋‹ˆ์ฆ˜ ํ•ด์„ ๊ธฐ๋ฒ•์„ ์‹ฌ์ธต์ ์œผ๋กœ ๋‹ค๋ฃจ๋ฉฐ, 3275์˜ PLM ์ž„๋ฒ ๋”ฉ๊ณผ ๋ฐ˜๋ณต ์‹คํ—˜ ์„ค๊ณ„ ํ•ด์„์— ์ด๋ก ์  ๊ทผ๊ฑฐ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
๋งˆ์Šคํฌ๋“œ LM ๊ธฐ๋ฐ˜ ๋‹จ๋ฐฑ์งˆ ์ž„๋ฒ ๋”ฉ ๋ฐ ํšจ์†Œ ๊ธฐ๋Šฅ ํƒ์ƒ‰ ์•Œ๊ณ ๋ฆฌ์ฆ˜์˜ ์‹คํ—˜์  ํ•œ๊ณ„๋ฅผ ๋…ผ์˜ํ•ด, PLM-clust์˜ ๋ฐ˜๋ณต์  ์•„์ด๋””์–ด ๋ฐœ๊ฒฌ ํ๋ฆ„์— ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
749๋Š” ์„œ์—ด ๊ธฐ๋ฐ˜์—์„œ ์œ ์ „์ฒดยท๋‹จ๋ฐฑ์งˆ ๊ธฐ๋Šฅ ์˜ˆ์ธก ๋ฐ ์„ค๊ณ„ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฃจ์–ด, 3275์˜ ์•”ํ˜ธํ™”๋œ ์–ธ์–ด๋ชจ๋ธ ๊ธฐ๋ฐ˜ ํšจ์†Œ ๋ฐœ๊ฒฌ ํ”„๋ ˆ์ž„์›Œํฌ์™€ ์ƒํ˜ธ๋ณด์™„์ ์œผ๋กœ ๋ณผ ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์˜ํ•™ ๋ฐ ๋ฐ”์ด์˜ค ๋ถ„์•ผ์—์„œ LLM ๊ธฐ๋ฐ˜ ์—์ด์ „ํŠธ์˜ ์ž์œจ ํƒ๊ตฌ/๋ฐœ๊ฒฌ ๋Šฅ๋ ฅ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ํ‰๊ฐ€ํ•˜์—ฌ ๋ฌด๊ฐ๋… ํšจ์†Œ ๊ธฐ๋Šฅ ๋ฐœ๊ฒฌ ์—ฐ๊ตฌ์™€ ๋น„๊ต ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์–ธ์ง€๋„ ์—†๋Š” ๋‹จ๋ฐฑ์งˆ ์–ธ์–ด ๋ชจ๋ธ์ด ํšจ์†Œ ํŒจํ„ด์„ ํ•™์Šตํ•˜๋Š” ๋ฌธ์ œ๋กœ, ๋‚ด๋ถ€ํšŒ๋กœ์  ํŒจํ„ด ๋งค์นญ ํ•ด์„์„ ๋”์šฑ ํ™•์žฅํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Unsupervised protein language models์˜ ํŒจํ„ด ํ•™์Šต์ด ์‹ค์ œ ์ƒ๋ฌผ ๋ฐ˜์‘๊ณผ ์–ด๋–ป๊ฒŒ ์—ฐ๊ฒฐ๋˜๋Š”์ง€ ๊ฒ€์ฆํ•ด์ฃผ๋Š” ๋…ผ๋ฌธ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๋ถˆ์ง€๋„ํ•™์Šต ๊ธฐ๋ฐ˜ ๋‹จ๋ฐฑ์งˆ ์–ธ์–ด๋ชจ๋ธ์ด ํšจ์†Œ ๊ธฐ๋Šฅ ํŒจํ„ด์„ ํ•™์Šตํ•˜๋Š” ๋ฐฉ๋ฒ•์„ ๋‹ค๋ฃจ๋ฉฐ, ๋‹จ์ผ์„œ์—ด ๊ธฐ๋ฐ˜ ESM-2์˜ ํ•จ์˜ ๋ฐ ํ™•์žฅ ๊ฐ€๋Šฅ์„ฑ์„ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
unsupervised protein language models๋ฅผ ํ™œ์šฉํ•œ ํšจ์†Œ ๊ธฐ๋Šฅ ์˜ˆ์ธก์„ ๋” ํ™•์žฅ๋œ ๋ฐฉ์‹์œผ๋กœ ๋ถ„์„ํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
๋‹จ๋ฐฑ์งˆ-๋ฆฌ๊ฐ„๋“œ ์ธํ„ฐ๋ž™์…˜ ์˜ˆ์ธก ๋ถ„์•ผ์—์„œ ์–ธ์–ด๋ชจ๋ธ ๊ธฐ๋ฐ˜ ํšจ์†Œ ๊ธฐ๋Šฅ ํƒ์ƒ‰ ๊ฒฐ๊ณผ๋ฅผ ์ ์šฉํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •