Towards Useful and Private Synthetic Omics: Community Benchmarking of Generative Models for Transcriptomics Data

์ €์ž: | ๋‚ ์งœ: 2026-03-02 | URL: https://www.biorxiv.org/content/10.64898/2026.03.02.707794v1 📄 PDF


Essence

Figure 1

Figure 1. Benchmarking framework for evaluating generative models on synthetic bulk RNA-seq data

๋ฒŒํฌ RNA-seq ๋ฐ์ดํ„ฐ์˜ ํ•ฉ์„ฑ ์ƒ์„ฑ์„ ์œ„ํ•œ 11๊ฐœ ์ƒ์„ฑ ๋ชจ๋ธ์„ ๋‘ ์•” ์ฝ”ํ˜ธํŠธ์™€ 978๊ฐœ ๋žœ๋“œ๋งˆํฌ ์œ ์ „์ž์— ๊ฑธ์ณ ์ฒด๊ณ„์ ์œผ๋กœ ๋ฒค์น˜๋งˆํ‚นํ•˜์—ฌ, ๋ถ„ํฌ ์ถฉ์‹ค๋„(distributional fidelity), ๋‹ค์šด์ŠคํŠธ๋ฆผ ์œ ์šฉ์„ฑ(downstream utility), ์ƒ๋ฌผํ•™์  ํƒ€๋‹น์„ฑ(biological plausibility), ํ”„๋ผ์ด๋ฒ„์‹œ ์œ„ํ—˜๋„ ์ธก๋ฉด์—์„œ ์ข…ํ•ฉ์ ์œผ๋กœ ํ‰๊ฐ€ํ•œ ์—ฐ๊ตฌ์ด๋‹ค. membership inference attack์— ๋Œ€ํ•œ ์ทจ์•ฝ์„ฑ๊ณผ ๋‹ค๋ฅธ ํ‰๊ฐ€ ์ฐจ์› ๊ฐ„์˜ trade-off๋ฅผ ๊ฐ•์กฐํ•˜์—ฌ ๋ชจ๋ธ ์„ ํƒ์˜ ์ง€์นจ์„ ์ œ์‹œํ•œ๋‹ค.

Motivation

Achievement

Figure 2

Figure 2. Fidelity metrics for BRCA and COMBINED datasets. Four metrics are shown in separate facets:

How

Figure 3

Figure 3. Utility metrics for BRCA and COMBINED datasets. Four metrics are shown, each in a separate

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: ํ•ฉ์„ฑ ์ƒ๋ฌผ์˜ํ•™ ๋ฐ์ดํ„ฐ์˜ ์ƒ์„ฑ์—์„œ utility-privacy-fidelity์˜ multi-dimensional trade-off๋ฅผ ์ฒด๊ณ„์ ์œผ๋กœ ๋ฒค์น˜๋งˆํ‚นํ•œ ์ค‘์š”ํ•œ ์ปค๋ฎค๋‹ˆํ‹ฐ ์—ฐ๊ตฌ๋กœ, ๊ด‘๋ฒ”์œ„ํ•œ ์ƒ์„ฑ ๋ชจ๋ธ์„ transcriptomic ๋ฐ์ดํ„ฐ๋ผ๋Š” ๊ตฌ์ฒด์ ์ด๊ณ  ์ค‘์š”ํ•œ ๋งฅ๋ฝ์—์„œ ํ‰๊ฐ€ํ•˜์—ฌ ๋ชจ๋ธ ์„ ํƒ์˜ ์‹ค๋ฌด์  ์ง€์นจ์„ ์ œ๊ณตํ•œ๋‹ค. ๋‹ค๋งŒ ํ‰๊ฐ€ ๋ฒ”์œ„์˜ ์ œํ•œ(์•” ์ฝ”ํ˜ธํŠธ, landmark genes, MIA)๊ณผ ๊ฒฐ๊ณผ์˜ ์ผ๋ฐ˜ํ™” ๊ฐ€๋Šฅ์„ฑ ๋…ผ์˜๊ฐ€ ๋ณด๊ฐ•๋˜๋ฉด ๋”์šฑ ๊ฐ•๋ ฅํ•  ๊ฒƒ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
882๋Š” ์ธ์šฉ ๋ฐ ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ๊ณผํ•™ ๋…ผ๋ฌธ ๊ฒ€์ฆ ์ž๋™ํ™”์— ๊ด€ํ•œ ์ตœ์‹  ์„œ๋ฒ ์ด๋กœ, 3266 ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์‹ฌ์ธต ํ‰๊ฐ€์˜ ๋ฉ”ํƒ€์—ฐ๊ตฌ์  ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ƒ๋ฌผ์ •๋ณดํ•™ ๊ธฐ์ดˆ ๋ชจ๋ธ์˜ ์ถฉ์‹ค๋„์™€ ์œ ์šฉ์„ฑ ํ‰๊ฐ€ ๋…ผ์˜๊ฐ€ synthetic omics benchmarking์˜ ์ด๋ก ์  ๋ฐฐ๊ฒฝ์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
757์€ LLM์„ ์ด์šฉํ•œ ํ‘œ ํ˜•์‹ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐ ๋ฒค์น˜๋งˆํ‚น์„ ๋‹ค๋ฃจ์–ด, 3266์˜ ํ•ฉ์„ฑ ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ ๋ฒค์น˜๋งˆํ‚น๊ณผ ์ง์ ‘์ ์œผ๋กœ ๋น„๊ต ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ•ฉ์„ฑ ์˜ค๋ฏน์Šค ์ƒ์„ฑ์„ ์œ„ํ•œ ๋‹ค์ค‘๋ชจ๋‹ฌ ์ƒ์„ฑ ๋ชจ๋ธ์˜ ๋ฒค์น˜๋งˆํ‚น ๊ด€์ ์—์„œ ๋น„๊ต๊ฐ€ ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
Omics ๋ฐ์ดํ„ฐ์˜ ํ”„๋ผ์ด๋ฒ„์‹œ ๋ณดํ˜ธ๋ฅผ ์œ„ํ•œ ํ•ฉ์„ฑ ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ์ ‘๊ทผ์œผ๋กœ, ๋‹ค๋ฅธ ๋ฐฉ์‹์˜ ํ”„๋ผ์ด๋ฒ„์‹œ-์œ ํ‹ธ๋ฆฌํ‹ฐ ํŠธ๋ ˆ์ด๋“œ์˜คํ”„ ํ•ด๊ฒฐ๋ฒ•์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
What Do Biological Foundation Models Compute ๋…ผ๋ฌธ์€ ์ƒ๋ฌผํ•™์  ๋ฐ์ดํ„ฐ์˜ ์ƒ์„ฑ๊ณผ ์‹ ๋ขฐ๋„ ๋ฌธ์ œ์— ๋Œ€ํ•œ ์‹ค์ฆ์  ๋Œ€์•ˆ ํ•ด์„์„ ์ œ์‹œํ•œ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
๋น„์ง€๋„/์ƒ์„ฑํ˜• ์˜ค๋ฏน์Šค ๋ฐ์ดํ„ฐ ์ƒ์„ฑ ๋ฐ ์œ ์šฉ์„ฑ ํ‰๊ฐ€์—์„œ ๋Œ์—ฐ๋ณ€์ด ์‹œ๊ทธ๋‹ˆ์ฒ˜ ํ•ฉ์„ฑ์˜ ์ง์ ‘์  ์‘์šฉ์ด ๊ฐ€๋Šฅํ•ฉ๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
ํ•ฉ์„ฑ ์˜ค๋ฏน์Šค ์ƒ์„ฑ๊ณผ ๊ฒ€์ฆ ๋ฒค์น˜๋งˆํ‚น์— ๊ณ ์ฐจ์› ๋ฐ์ดํ„ฐ ๊ธฐ๋ฐ˜ ํผํ…์…œ ์‹ ๋ขฐ์„ฑ ๊ฒ€์ฆ ๋ฐ ์ž์œ ์—๋„ˆ์ง€ ์˜ˆ์ธก ๋ฐฉ๋ฒ•์˜ ์‹ค์šฉํ™” ๊ฐ€๋Šฅ์„ฑ์„ ๋…ผ์˜ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •