Generative machine learning unlocks the first proteome-wide image of human cells

์ €์ž: | ๋‚ ์งœ: 2026-04-02 | URL: https://www.biorxiv.org/content/10.64898/2026.03.31.715748v1 📄 PDF


Essence

Figure 2

Fig. 2. Model evaluation and benchmarking. (A) Realistic virtual immuno๏ฌ‚uorescence images of major organelles and cellul

๋ณธ ๋…ผ๋ฌธ์€ 3๊ฐœ์˜ ์„ธํฌ ๋žœ๋“œ๋งˆํฌ ์—ผ์ƒ‰(nucleus, ER, microtubules)์œผ๋กœ๋ถ€ํ„ฐ 12,800๊ฐœ ์ธ๊ฐ„ ๋‹จ๋ฐฑ์งˆ์˜ ํ˜„๋ฏธ๊ฒฝ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” deep generative model์ธ ProtiCelli๋ฅผ ๊ฐœ๋ฐœํ–ˆ๋‹ค. ์ด ๋ชจ๋ธ์€ Human Protein Atlas์˜ 123๋งŒ ๊ฐœ ์ด๋ฏธ์ง€๋กœ ํ•™์Šต๋˜์–ด ๊ธฐ์กด ๋ฐฉ๋ฒ•๋ณด๋‹ค ์šฐ์ˆ˜ํ•œ ์žฌ๊ตฌ์„ฑ ์ •ํ™•๋„์™€ ํ…์Šค์ฒ˜ ์ถฉ์‹ค๋„๋ฅผ ๋ณด์ด๋ฉฐ, ๋ฏธํ•™์Šต ์„ธํฌ ์œ ํ˜•๊ณผ ์•ฝ๋ฌผ ์„ญ๋™์— ์ผ๋ฐ˜ํ™”๋œ๋‹ค.

Motivation

Achievement

Figure 1

Fig. 1. ProtiCelli study overview. ProtiCelli was trained on near-proteome-wide single-cell immuno๏ฌ‚uorescence images fro

ProtiCelli ๋ชจ๋ธ ๊ฐœ๋ฐœ ๋ฐ ์„ฑ๋Šฅ: 12,800๊ฐœ ๋‹จ๋ฐฑ์งˆ์˜ ํ˜„๋ฏธ๊ฒฝ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์œผ๋กœ ๊ธฐ์กด ๋ชจ๋ธ ๋Œ€๋น„ ์žฌ๊ตฌ์„ฑ ์ •ํ™•๋„์™€ ํ…์Šค์ฒ˜ ์ถฉ์‹ค๋„ ์šฐ์ˆ˜์„ฑ ์ž…์ฆ. ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ: ๋ฏธํ•™์Šต ์„ธํฌ ์œ ํ˜•๊ณผ ์•ฝ๋ฌผ ์„ญ๋™์— ์ผ๋ฐ˜ํ™”๋˜๋ฉฐ ์„ธํฌ ์‚ฌ์ดํด ๋‹จ๊ณ„ ์˜ˆ์ธก ๋ฐ ์•ฝ๋ฌผ ์œ ๋„ ๋‹จ๋ฐฑ์งˆ ๋ฐœํ˜„/๊ตญ์†Œํ™” ๋ณ€ํ™” ์ถ”๋ก  ๋‹ฌ์„ฑ. Proteome2Cell ๋ฐ์ดํ„ฐ์…‹ ์ƒ์„ฑ: 12๊ฐœ ์ธ๊ฐ„ ์„ธํฌ์ฃผ์— ๊ฑธ์ณ 2,400๊ฐœ "virtual cell"์„ ํฌํ•จํ•˜๋Š” 3,070๋งŒ ๊ฐœ ์‹œ๋ฎฌ๋ ˆ์ด์…˜ ์ด๋ฏธ์ง€ ๋ฐ์ดํ„ฐ์…‹ ๊ตฌ์ถ•. ์ƒ๋ฌผํ•™์  ์‘์šฉ: ๋‹จ์ผ์„ธํฌ ์ˆ˜์ค€์—์„œ ๋‹จ๋ฐฑ์งˆ-๋‹จ๋ฐฑ์งˆ ์ƒํ˜ธ์ž‘์šฉ ๊ฒฝ๊ด€ ์žฌํ˜„, moonlighting ๋‹จ๋ฐฑ์งˆ์˜ ๊ตฌํš ํŠน์ด์  ๊ธฐ๋Šฅ ํ•ด์„, ๋น„๊ฐ๋… ์„ธํฌ ๊ตฌํš ๋ถ„ํ•  ๋ฐ ์œ ์ „์ž ์ง‘ํ•ฉ์˜ ๊ธฐ๋Šฅ ์˜์—ญ ๊ณต๊ฐ„ ๋ถ„ํ•ด ๊ฐ€๋Šฅ.๊ณ„์ธต์  ๋‹จ์ผ์„ธํฌ ๋ชจ๋ธ: ๋ณด์กด๋œ vs ๋™์  ๋‹จ๋ฐฑ์งˆ ์•„ํ‚คํ…์ฒ˜ ๊ตฌ๋ถ„์œผ๋กœ ๋‹จ์ผ์„ธํฌ ๊ตฌ์กฐ์˜ ์ „๋ก€ ์—†๋Š” ์ˆ˜์ค€์˜ ๋ถ„์„ ์‹คํ˜„.

How

Figure 1

Fig. 1. ProtiCelli study overview. ProtiCelli was trained on near-proteome-wide single-cell immuno๏ฌ‚uorescence images fro

Originality

Limitation & Further Study

ํ•œ๊ณ„: ๋ชจ๋ธ์˜ ์กฐ๊ฑดํ™”๊ฐ€ ์—ฌ์ „ํžˆ 3๊ฐœ ์ฃผ์š” landmark์— ์ œํ•œ๋˜์–ด ์žˆ์–ด ํŠน์ • ์„ธํฌ ๊ตฌํš(์˜ˆ: lysosome, peroxisome ๋“ฑ)์˜ ์ •ํ™•ํ•œ ํ‘œํ˜„์— ํ•œ๊ณ„ ๊ฐ€๋Šฅ. ์ƒ์„ฑ๋œ ์ด๋ฏธ์ง€๊ฐ€ ์‹ค์ œ ์‹คํ—˜ ์ด๋ฏธ์ง€์˜ ํ†ต๊ณ„์  ๋ถ„ํฌ๋ฅผ ์™„๋ฒฝํ•˜๊ฒŒ ๋”ฐ๋ฅด๋Š”์ง€ ํ™•์ธ ํ•„์š”. Proteome2Cell ๋ฐ์ดํ„ฐ์…‹์˜ ์ƒ์„ฑ ์ด๋ฏธ์ง€ ํƒ€๋‹น์„ฑ์€ ์ œํ•œ๋œ ์‹คํ—˜์  ๊ฒ€์ฆ์—๋งŒ ๊ธฐ๋ฐ˜.

ํ›„์† ์—ฐ๊ตฌ: (1) ์ถ”๊ฐ€ cellular landmark ๋„์ž…์œผ๋กœ ๊ตฌํš ํ‘œํ˜„ ์ •ํ™•๋„ ๊ฐœ์„ , (2) ์‹ค์‹œ๊ฐ„ ๋™์  ๋‹จ๋ฐฑ์งˆ ๊ตญ์†Œํ™” ๋ณ€ํ™” ๋ชจ๋ธ๋ง, (3) ์งˆ๋ณ‘ ์ƒํƒœ(์•”, ์‹ ๊ฒฝํ‡ดํ–‰์„ฑ ์งˆํ™˜)์—์„œ ๋‹จ๋ฐฑ์งˆ ์˜ค์œ„์น˜ํ™” ์‹œ๋ฎฌ๋ ˆ์ด์…˜, (4) ๊ตฌ์กฐ์  ๋ณ€์ด์ฒด์™€ ์งˆ๋ณ‘ ๋Œ์—ฐ๋ณ€์ด์˜ ๊ตญ์†Œํ™” ํšจ๊ณผ ์˜ˆ์ธก, (5) ๋‹ค์ค‘ ๋ชจ๋‹ฌ๋ฆฌํ‹ฐ ์ด๋ฏธ์ง•(์ „์žํ˜„๋ฏธ๊ฒฝ, super-resolution) ๋ฐ์ดํ„ฐ์™€์˜ ํ†ตํ•ฉ

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 5/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ๋ณธ ๋…ผ๋ฌธ์€ conditional diffusion model์„ ํ™œ์šฉํ•˜์—ฌ proteome ๊ทœ๋ชจ์˜ ๊ณต๊ฐ„ ๋‹จ๋ฐฑ์งˆํ•™ ์‹œ๋ฎฌ๋ ˆ์ด์…˜์„ ์ตœ์ดˆ๋กœ ์„ฑ์ทจํ•œ ๊ธฐ๋…๋น„์  ์—ฐ๊ตฌ์ด๋‹ค. 12,800๊ฐœ ๋‹จ๋ฐฑ์งˆ์˜ ํ˜„๋ฏธ๊ฒฝ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ๋Šฅ๋ ฅ, ๋ฏธํ•™์Šต ์กฐ๊ฑด์œผ๋กœ์˜ ์šฐ์ˆ˜ํ•œ ์ผ๋ฐ˜ํ™”, ๊ทธ๋ฆฌ๊ณ  ๋‹จ์ผ์„ธํฌ ์ˆ˜์ค€์˜ ๋‹จ๋ฐฑ์งˆ ์กฐ์งํ™” ๋ถ„์„์ด๋ผ๋Š” ๋‹ค์ธต์  ๊ธฐ์—ฌ๋กœ ์„ธํฌ ์ƒ๋ฌผํ•™๊ณผ ๊ธฐ๋Šฅ ์œ ์ „์ฒดํ•™ ๋ถ„์•ผ์— ํ˜์‹ ์  ์˜ํ–ฅ์„ ๋ฏธ์น  ์ž ์žฌ๋ ฅ์ด ํฌ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Evo 2๋Š” ๋ฒ”์ƒ๋ช…์ฒด ๊ฒŒ๋†ˆ ๋ชจ๋ธ๋ง์šฉ ์ดˆ๋Œ€๊ทœ๋ชจ ๊ธฐ์ดˆ ๋ชจ๋ธ๋กœ, ๋‹จ๋ฐฑ์งˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ ๊ฐ€๋Šฅํ•œ proteome-level ๋ฐ์ดํ„ฐ ํ•™์Šต์˜ ์ €๋ณ€์„ ์ œ๊ณตํ•œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Foundation models in bioinformatics ๋…ผ๋ฌธ์€ ProtiCelli์™€ ๊ฐ™์€ ๋‹จ๋ฐฑ์งˆ/์ƒ๋ช…์ •๋ณด๋ชจ๋ธ์˜ ๊ทผ๊ฐ„์ด ๋˜๋Š” ๋ฐ”์ด์˜ค ํŒŒ์šด๋ฐ์ด์…˜๋ชจ๋ธ ํ˜„ํ™ฉ๊ณผ ์‘์šฉ์„ ์ •๋ฆฌํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Geometry Informed Tokenization of Molecules๋Š” ๋ถ„์ž/๋‹จ๋ฐฑ์งˆ ๊ตฌ์กฐ๋ฅผ ์œ„ํ•œ ์ƒ์„ฑ์  ํ‘œํ˜„ ํ•™์Šต์ด๋ผ๋Š” ์œ ์‚ฌ ๋ฌธ์ œ๋ฅผ ๋‹ค๋ฅธ ์ ‘๊ทผ๋ฒ•(geometry ๊ธฐ๋ฐ˜)์œผ๋กœ ๋‹ค๋ฃจ์–ด ๋น„๊ต ์ฝ๊ธฐ์— ์ข‹์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
RFdiffusion ๋…ผ๋ฌธ์€ de novo ๋‹จ๋ฐฑ์งˆ ๋ฐ ํ•ญ์ฒด ์„ค๊ณ„์— ํŠนํ™”๋œ ๋ฐฉ๋ฒ•์œผ๋กœ, ๋‹จ๋ฐฑ์งˆ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์ด ์•„๋‹Œ ๊ตฌ์กฐ ์„ค๊ณ„๋ฅผ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
3382์—์„œ๋Š” Evo 2์™€ ์œ ์‚ฌํ•œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ ๋‹จ๋ฐฑ์งˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ 382๋ฒˆ์˜ ๋ชจ๋ธ์„ ์‹ค์ œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ์ ์šฉํ•œ ์‚ฌ๋ก€๋ฅผ ๋ณด์ธ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
์ƒ๋ฌผํ•™์  ๋ถ„์ž์—์„œ proteome-wide ์ƒ์„ฑ ํƒœ์Šคํฌ๋กœ LLM ํ™œ์šฉ ์‚ฌ๋ก€๋ฅผ ์ œ์‹œํ•˜๋ฉฐ, ์ •๋ณด ์ถ”์ถœ ๋ฐ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ตฌ์ถ•์˜ ๋ฒ”์œ„๋ฅผ ํ™•์žฅํ•œ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
AlphaFold Database๋Š” ๋Œ€๊ทœ๋ชจ ์ธ๊ฐ„ ๋‹จ๋ฐฑ์งˆ์˜ ๊ตฌ์กฐ ์ •๋ณด๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๋‹จ๋ฐฑ์งˆ ํ˜„๋ฏธ๊ฒฝ ์ด๋ฏธ์ง€ ์ƒ์„ฑ ์—ฐ๊ตฌ์™€ ์ƒํ˜ธ ๋ณด์™„์ ์ž…๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
BioMiner๋Š” ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์‹œ์Šคํ…œ์„ ํ†ตํ•œ ๋‹จ๋ฐฑ์งˆ ์ด๋ฏธ์ง€์™€ ๊ธฐ๋Šฅ์ •๋ณด ๋งˆ์ด๋‹์„ ํ•˜์—ฌ, ์ƒ์„ฑ ๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๋ถ„์„์— ์‹ค์งˆ์  ์‘์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •