Genome modeling and design across all domains of life with Evo 2

์ €์ž: Garyk Brixi, Matthew G. Durrant, Jerome Ku, Michael Poli, Greg Brockman, Daniel Chang, Gabriel A. Gonzalez, Samuel H. King, David B. Li, Aditi T. Merchant, Mohsen Naghipourfar, Eric Nguyen, Chiara Ricci-Tam, David W. Romero, Gwanggyu Sun, Ali Taghibakshi, Anton Vorontsov, Brandon Yang, Myra Deng, Liv Gorton, Nam Nguyen, Nicholas K. Wang, Etowah Adams, Stephen A. Baccus, Steven Dillmann, Stefano Ermon, Daniel Guo, Rajesh Ilango, Ken Janik, Amy X. Lu, Reshma Mehta, Mohammad R.K. Mofrad, Madelena Y. Ng, Jaspreet Pannu, Christopher Rรฉ, Jonathan C. Schmok, John St. John, Jeremy Sullivan, Kevin Zhu, Greg Zynda, Daniel Balsam, Patrick Collison, Anthony B. Costa, Tina Hernandez-Boussard, Eric Ho, Ming-Yu Liu, Thomas McGrath, Kimberly Powell, Dave P. Burke, Hani Goodarzi, Patrick D. Hsu, Brian L. Hie | ๋‚ ์งœ: 2025-02-21 | DOI: 10.1101/2025.02.18.638918 📄 PDF


Essence

Figure 1

Figure 1 | Overview of model architecture, training procedure, datasets, and evaluations for Evo 2.

Evo 2๋Š” 9.3์กฐ ๊ฐœ์˜ DNA ์—ผ๊ธฐ์Œ์œผ๋กœ ํ›ˆ๋ จ๋œ ์ƒ๋ฌผํ•™์  ๊ธฐ์ดˆ ๋ชจ๋ธ๋กœ, 7B์™€ 40B ๋งค๊ฐœ๋ณ€์ˆ˜๋กœ 1๋ฐฑ๋งŒ ํ† ํฐ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ๋ฅผ ๊ฐ€์ง€๋ฉฐ ๋ชจ๋“  ์ƒ๋ช… ์˜์—ญ์—์„œ ๊ฒŒ๋†ˆ ๋ชจ๋ธ๋ง ๋ฐ ์„ค๊ณ„๋ฅผ ์ˆ˜ํ–‰ํ•œ๋‹ค.

Motivation

Achievement

How

Figure 1

Figure 1 | Overview of model architecture, training procedure, datasets, and evaluations for Evo 2.

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: Evo 2๋Š” ๊ฒŒ๋†ˆ ๊ธฐ์ดˆ ๋ชจ๋ธ๋กœ์„œ unprecedented ๊ทœ๋ชจ(9.3์กฐ ํ† ํฐ, 1๋ฐฑ๋งŒ ์ปจํ…์ŠคํŠธ ์œˆ๋„์šฐ)์™€ ์„ฑ๋Šฅ(๋ณ€์ด ํšจ๊ณผ ์˜ˆ์ธก, ๊ฒŒ๋†ˆ ๊ทœ๋ชจ ์ƒ์„ฑ, ๊ธฐ๊ณ„์  ํ•ด์„๊ฐ€๋Šฅ์„ฑ)์„ ๋‹ฌ์„ฑํ•˜์˜€์œผ๋ฉฐ, ์™„์ „ ๊ณต๊ฐœ ๋ชจ๋ธ๊ณผ ๋ฐ์ดํ„ฐ์…‹์œผ๋กœ ํ•ฉ์„ฑ์ƒ๋ฌผํ•™๊ณผ ๊ฒŒ๋†ˆ ์„ค๊ณ„ ๋ถ„์•ผ์— ํ˜์‹ ์  ๊ธฐ์—ฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
382๋Š” ESM ๊ธฐ๋ฐ˜ ์œ ์ „์ฒด ์„ค๊ณ„ ๋ฐ ์˜ˆ์ธก์„ ๋‹ค๋ฃจ๋Š” foundational ๋ชจ๋ธ๋กœ, 749์˜ Evo ๋ชจ๋ธ์ด ๋‹ค์–‘ํ•œ ์ƒ๋ฌผํ•™์  ์ž‘์—… ์ˆ˜ํ–‰์— ์ด๋ก ์  ํ† ๋Œ€๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ƒ๋ช… ์˜์—ญ์„ ํฌ๊ด„ํ•˜๋Š” ๊ฒŒ๋†ˆ ๊ธฐ์ดˆ ๋ชจ๋ธ์˜ ๊ตฌ์กฐ์  ์„ฑ๊ฒฉ ๋…ผ์˜ ๋ฐ DNA ์„œ์—ด ์„ค๊ณ„ ์ „๋žต์˜ ๊ธฐ๋ฐ˜์ด ๋œ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Foundation models in bioinformatics ๋…ผ๋ฌธ์—์„œ Evo 2์™€ ๊ฐ™์€ ์ƒ๋ช…์ •๋ณด ํŒŒ์šด๋ฐ์ด์…˜๋ชจ๋ธ์˜ ํ˜„ํ™ฉ๊ณผ ํ•œ๊ณ„, ํ˜„์—… ์‘์šฉ์„ ์ข…ํ•ฉ์ ์œผ๋กœ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
Evo 2๋Š” ๋ฒ”์ƒ๋ช…์ฒด ๊ฒŒ๋†ˆ ๋ชจ๋ธ๋ง์šฉ ์ดˆ๋Œ€๊ทœ๋ชจ ๊ธฐ์ดˆ ๋ชจ๋ธ๋กœ, ๋‹จ๋ฐฑ์งˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑ ๊ฐ€๋Šฅํ•œ proteome-level ๋ฐ์ดํ„ฐ ํ•™์Šต์˜ ์ €๋ณ€์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
๋ถ„์ž์—์„œ ์œ ์ „์ฒด ์ „์ฒด์— ์ด๋ฅด๋Š” ์‹œํ€€์Šค ๋””์ž์ธ ๋ฐ ๋ชจ๋ธ๋ง ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค๋ฅธ ๋ฒ”์œ„์™€ ์ „๋žต์œผ๋กœ ์ œ์‹œํ•˜์—ฌ ์ƒํ˜ธ๋ณด์™„์ ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ฐ์ดํ„ฐ๋ฅผ ์œ„ํ•œ ๋Œ€๊ทœ๋ชจ ์–ธ์–ด๋ชจ๋ธ์˜ ๋‹ค๋ฅธ ๊ตฌํ˜„์„ ์ œ์‹œํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
์œ ์ „์ฒด ๋ถ„์„ ๋ฐ ์„ค๊ณ„๋ฅผ ์œ„ํ•œ AI ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•๋ก ์„ ๋‹ค๋ฃจ๋Š” ์œ ์‚ฌํ•œ ์—ฐ๊ตฌ์ด๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
DNA ๋ฐ ๋‹จ๋ฐฑ์งˆ ์„œ์—ด ์„ค๊ณ„๋ฅผ ์œ„ํ•œ ๋‹ค๋ฅธ ์ƒ์„ฑ ๋ชจ๋ธ์„ ๋‹ค๋ฃจ๋Š” ๊ด€๋ จ ์—ฐ๊ตฌ์ด๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Genome modeling and design across all domains of life with Evo 2 ๋…ผ๋ฌธ์€ ํŒŒ์šด๋ฐ์ด์…˜๋ชจ๋ธ ๊ธฐ๋ฐ˜ ๊ฒŒ๋†ˆ ๋””์ž์ธ์˜ ์‹ค์ œ ์„ฑ๋Šฅ๊ณผ ๋ฐ์ดํ„ฐ ๊ทœ๋ชจ์˜ ์ค‘์š”์„ฑ์„ ์‹ค์ฆ์ ์œผ๋กœ ๋ณด์—ฌ์ค๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
3382์—์„œ๋Š” Evo 2์™€ ์œ ์‚ฌํ•œ ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ๋กœ ๋‹จ๋ฐฑ์งˆ ์ด๋ฏธ์ง€๋ฅผ ์ƒ์„ฑํ•˜๋Š” ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ๊ฐœ๋ฐœํ•˜์—ฌ 382๋ฒˆ์˜ ๋ชจ๋ธ์„ ์‹ค์ œ ์ด๋ฏธ์ง€ ์ƒ์„ฑ์— ์ ์šฉํ•œ ์‚ฌ๋ก€๋ฅผ ๋ณด์ธ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
Genome modeling and design across all domains of life with ESM3๋Š” ์ƒ๋ฌผํ•™์  ์„œ์—ด ๋ชจ๋ธ๋กœ์„œ ๋Œ€ํ˜• ๊ณผํ•™ LLM ์‘์šฉ์„ ์‹ค์ฆํ•˜๋Š” ์‚ฌ๋ก€์ž…๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •