FLIP2: Expanding Protein Fitness Landscape Benchmarks for Real-World Machine Learning Applications

์ €์ž: | ๋‚ ์งœ: 2026-02-25 | URL: https://www.biorxiv.org/content/10.64898/2026.02.23.707496v2 📄 PDF


Essence

Figure 1

Figure 1. The FLIP2 benchmark. (A) The FLIP2 datasets. Solid boxes indicate the name of the dataset. Dashed boxes indica

์ด ๋…ผ๋ฌธ์€ ๋‹จ๋ฐฑ์งˆ ๊ณตํ•™์„ ์œ„ํ•œ ML ๊ธฐ๋ฐ˜ fitness ์˜ˆ์ธก์˜ ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ์„ ํ‰๊ฐ€ํ•˜๊ธฐ ์œ„ํ•ด FLIP2 ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ œ์‹œํ•œ๋‹ค. ํšจ์†Œ, ๊ด‘๋ฏผ๊ฐ ๋‹จ๋ฐฑ์งˆ, ๋‹จ๋ฐฑ์งˆ-๋‹จ๋ฐฑ์งˆ ์ƒํ˜ธ์ž‘์šฉ ๋“ฑ 7๊ฐœ์˜ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹๊ณผ wild-type ๋ฐฐ๊ฒฝ ๋ณ€ํ™”, ๋ฏธ์„ธ ์œ„์น˜ ๋ณ€ํ™” ๋“ฑ์„ ํฌํ•จํ•œ 16๊ฐ€์ง€ split ์ „๋žต์„ ๋„์ž…ํ•˜์—ฌ, ์‹ค์ œ ๋‹จ๋ฐฑ์งˆ ๊ณตํ•™ ์บ ํŽ˜์ธ์—์„œ ๋งˆ์ฃผ์น˜๋Š” ๋„๋ฉ”์ธ ์‹œํ”„ํŠธ๋ฅผ ๋” ํ˜„์‹ค์ ์œผ๋กœ ๋ฐ˜์˜ํ•œ๋‹ค.

Motivation

Achievement

Figure 1

Figure 1. The FLIP2 benchmark. (A) The FLIP2 datasets. Solid boxes indicate the name of the dataset. Dashed boxes indica

How

Figure 1

Figure 1. The FLIP2 benchmark. (A) The FLIP2 datasets. Solid boxes indicate the name of the dataset. Dashed boxes indica

Originality

Limitation & Further Study

Evaluation

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: FLIP2๋Š” ๋‹จ๋ฐฑ์งˆ ๊ณตํ•™ ๋ฒค์น˜๋งˆํ‚น์˜ ์ค‘์š”ํ•œ ์ง„์ „์ด๋‹ค. ๊ธฐ์กด FLIP์„ 7๊ฐœ์˜ ์ƒˆ๋กœ์šด ๋ฐ์ดํ„ฐ์…‹๊ณผ 16๊ฐ€์ง€ split์œผ๋กœ ํ™•์žฅํ•˜์—ฌ ์‹ค๋ฌด ์ค‘์‹ฌ์˜ ํ‰๊ฐ€๋ฅผ ๊ฐ€๋Šฅํ•˜๊ฒŒ ํ–ˆ์œผ๋ฉฐ, fine-tuned pLM์˜ ์—ญํ• ์— ๋Œ€ํ•œ ์žฌ๊ฒ€ํ† ๋Š” ํ•„๋“œ์— ์ค‘์š”ํ•œ ์˜๋ฌธ์„ ์ œ๊ธฐํ•œ๋‹ค. ๋‹ค๋งŒ ์ผ๋ถ€ ๋ฐ์ดํ„ฐ์…‹์˜ ๊ทœ๋ชจ๊ฐ€ ์ž‘๊ณ , transfer learning ์„ฑ๋Šฅ ์ €ํ•˜์˜ ๊ทผ๋ณธ ์›์ธ์— ๋Œ€ํ•œ ์‹ฌํ™” ๋ถ„์„์ด ๋ถ€์กฑํ•˜๋‹ค. ๋ฒค์น˜๋งˆํฌ ๋…ผ๋ฌธ์œผ๋กœ์„œ ๋ฐฉ๋ฒ•๋ก ์  ๊ฑด์ „์„ฑ๊ณผ ํฌ๊ด„์„ฑ์€ ๋†’์œผ๋‚˜, ๋ฉ”์ปค๋‹ˆ์ฆ˜์  ํ†ต์ฐฐ์€ ์ œํ•œ์ ์ด๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
105 ๋…ผ๋ฌธ์ด ์ œ์‹œํ•œ ๊ณผํ•™์  AI์˜ ๊ธฐ๋ณธ ๊ตฌ์กฐ๊ฐ€ ๋‹จ๋ฐฑ์งˆ fitness landscape ๋ชจ๋ธ๋ง์—๋„ ๊ทผ๊ฐ„์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
144๋ฒˆ ๋…ผ๋ฌธ์€ ๋‹จ๋ฐฑ์งˆ ๋ฐ”์ธ๋”ฉ ๋ถ€์œ„ ์˜ˆ์ธก์„ ์œ„ํ•œ LLM ๊ธฐ๋ฐ˜ ํ”„๋ ˆ์ž„์›Œํฌ๋ฅผ ์ œ์•ˆํ•˜๋ฏ€๋กœ, 3104์—์„œ ํ™•์žฅ๋œ fitness benchmark ์„ค๊ณ„์˜ ๋ฐฐ๊ฒฝ์ด ๋ฉ๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ESM-2 ๋“ฑ ๋‹จ๋ฐฑ์งˆ ์–ธ์–ด๋ชจ๋ธ์ด proteome-wide ์˜ˆ์ธก์„ ์ด๋ฃฌ ์‚ฌ๋ก€๋กœ, FLIP2 ๋ฒค์น˜๋งˆํฌ์˜ ํฌ๊ด„์„ฑ๊ณผ ํŒŒ๊ธ‰ํšจ๊ณผ ๋…ผ์˜์— ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
749๋ฒˆ ๋…ผ๋ฌธ์€ ์„œ์—ด ๊ธฐ๋ฐ˜ ๋ฐ ๊ตฌ์กฐ ๊ธฐ๋ฐ˜ ๋‹จ๋ฐฑ์งˆ/๊ฒŒ๋†ˆ ์„ค๊ณ„์™€ ์˜ˆ์ธก์„ ํฌ๊ด„ํ•˜๋ฏ€๋กœ, 3104์˜ ์‹ค์ œ ๋‹จ๋ฐฑ์งˆ ์—”์ง€๋‹ˆ์–ด๋ง ์บ ํŽ˜์ธ ์ ์šฉ์„ฑ๊ณผ ์—ฐ๊ฒฐํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
๋‹จ๋ฐฑ์งˆ fitness landscape ๋ฒค์น˜๋งˆํฌ๋ฅผ ์ด์šฉํ•ด ๋“œ๋Ÿฌ๋‚œ deep-learning ๊ธฐ๋ฐ˜ ์˜ˆ์ธก์˜ ์‹ค์ œ ์„ฑ๋Šฅ๊ณผ ๋ฐ์ดํ„ฐ ํ™œ์šฉ ๋ฐฉ์‹์„ ์‹ฌ์ธต์ ์œผ๋กœ ๋ถ„์„ํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
3147 ๋…ผ๋ฌธ์€ FLIP2(3104)์™€ ์œ ์‚ฌํ•˜๊ฒŒ ๋‹จ๋ฐฑ์งˆ ๊ธฐ๋Šฅ ์˜ˆ์ธก์˜ ๋ฒค์น˜๋งˆํ‚น ๋ฐ ์žฌํ˜„์„ฑ ๋ฌธ์ œ๋ฅผ ์žฅ๊ธฐ์  ๊ด€์ ์—์„œ ๋‹ค๋ฃน๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
์‹ ๊ทœ ํ•ญ์ฒด/๋‹จ๋ฐฑ์งˆ ๊ฒฐํ•ฉ ์นœํ™”๋ ฅ ์˜ˆ์ธก ๋…ผ๋ฌธ์€ FLIP2์—์„œ ๋‹ค๋ฃจ๋Š” fitness landscape ์ผ๋ฐ˜ํ™” ์‹ค์ฆ์— ์‹ค์ œ ์ ์šฉ์‚ฌ๋ก€๋กœ ์—ฐ๊ณ„ํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
์‘์šฉ ์‚ฌ๋ก€
FLIP2 ๋ฒค์น˜๋งˆํฌ์˜ ์‹ค์ œ ๋‹จ๋ฐฑ์งˆ ๋ณ€์ด ์˜ˆ์ธก๊ณผ ํ•ญ์ฒด ๊ณตํ•™ ์ƒ˜ํ”Œ๋ง ๋ฐฉ๋ฒ•์˜ ๋‹ค์–‘์„ฑ, ํ˜„์‹ค์  ๋น„๊ต ์‚ฌ๋ก€๋กœ ์ง์ ‘ ํ™œ์šฉ์ด ์—ฐ๊ณ„๋ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •