Contrastive learning for antibody-antigen sequence-to-specificity prediction

์ €์ž: | ๋‚ ์งœ: 2026-02-26 | URL: https://www.biorxiv.org/content/10.64898/2026.02.25.707916v1 📄 PDF


Essence

Figure 1

Fig 1. Schematic of CALM (Crossโ€‘attention Adaptive Immune Receptorโ€“Antigen Language Model) architecture.

CALM์€ contrastive learning์„ ํ†ตํ•ด ํ•ญ์ฒด์™€ ํ•ญ์› ์„œ์—ด์„ ๊ณต์œ  ์ž„๋ฒ ๋”ฉ ๊ณต๊ฐ„์— ์ •๋ ฌํ•˜๋Š” dual-encoder ์•„ํ‚คํ…์ฒ˜๋กœ, sequence-to-specificity ์˜ˆ์ธก ๋ฌธ์ œ๋ฅผ ํ’€๊ธฐ ์œ„ํ•œ ๊ธฐ์ดˆ ๋ชจ๋ธ์ด๋‹ค. SAbDab์˜ 4,138๊ฐœ ํ•ญ์ฒด-ํ•ญ์› ์Œ์œผ๋กœ ํ•™์Šตํ•œ ๊ฒฐ๊ณผ, 80% ๋™์ผ์„ฑ ์ˆ˜์ค€์˜ ์œ ์ถœ ํ†ต์ œ ํ‰๊ฐ€์—์„œ ์–‘๋ฐฉํ–ฅ R@1 ํ‰๊ท  7%๋ฅผ ๋‹ฌ์„ฑํ–ˆ๋‹ค.

Motivation

Achievement

Figure 1

Fig 1. Schematic of CALM (Crossโ€‘attention Adaptive Immune Receptorโ€“Antigen Language Model) architecture.

ํ•ญ์ฒด-ํ•ญ์› co-embedding ๊ตฌํ˜„: dual-encoder contrastive stage๋ฅผ ์™„์ „ ๊ตฌํ˜„ํ•˜์—ฌ ์–‘๋ฐฉํ–ฅ ๊ฒ€์ƒ‰(Abโ†’Ag, Agโ†’Ab) ์ง€์›. ์œ ์ถœ ํ†ต์ œ ํ‰๊ฐ€: SAbDab 4,138๊ฐœ ์Œ ์ค‘ 5ร… ๊ฑฐ๋ฆฌ ๊ธฐ๋ฐ˜ binding-site mask ์ƒ์„ฑ, MMseqs2 ํด๋Ÿฌ์Šคํ„ฐ๋ง์œผ๋กœ 80% ๋™์ผ์„ฑ ๋ˆ„์ถœ ํ†ต์ œ ํ…Œ์ŠคํŠธ ์„ธํŠธ ๊ตฌ์„ฑ. ์„ฑ๋Šฅ: 80% ๋™์ผ์„ฑ ํด๋Ÿฌ์Šคํ„ฐ ๋ˆ„์ถœ ํ†ต์ œ ํ‰๊ฐ€์—์„œ ํ‰๊ท  R@1 7% ๋‹ฌ์„ฑ, ์–‘๋ฐฉํ–ฅ ์„ฑ๋Šฅ ์ผ๊ด€์„ฑ ํ™•์ธ. ๊ตฌ์กฐ ์„ค๊ณ„: decoder stage ๊ฐœ๋…์  ํ”„๋ ˆ์ž„์›Œํฌ๋กœ ํ–ฅํ›„ epitope mapping๊ณผ ์กฐ๊ฑด๋ถ€ ์„ค๊ณ„ ๊ฐ€๋Šฅ์„ฑ ์ œ์‹œ.

How

Figure 2

Fig 2. Dataset curation and preprocessing. Abโ€“Ag complex structures were extracted from SAbDab. Complexes were

Originality

Limitation & Further Study

์ œํ•œ์‚ฌํ•ญ: - R@1 7%๋Š” ์‹ค๋ฌด ์ ์šฉ ๊ธฐ์ค€(์˜ˆ: ์ง„๋‹จ, ์น˜๋ฃŒ์ œ ์„ค๊ณ„)์— ๋น„ํ•ด ๋‚ฎ์œผ๋ฉฐ, ์ƒ์œ„-k ๊ฒ€์ƒ‰(R@5, R@10) ์„ฑ๋Šฅ ๋ฏธ์ œ์‹œ. - 4,138๊ฐœ ์Œ์€ proteome/repertoire ๊ทœ๋ชจ ๋Œ€๋น„ ์ œํ•œ์ ์ด๋ฉฐ, ๊ตฌ์กฐ ์ฃผ์„ ์˜์กด์„ฑ์œผ๋กœ ์ธํ•ด ํ™•์žฅ์„ฑ ์ œ์•ฝ. - Proposed decoder stage๋Š” ๋ฏธ๊ตฌํ˜„์œผ๋กœ ์ƒ์„ฑ ๋Šฅ๋ ฅ ๋ฏธ๊ฒ€์ฆ; contrastive retrieval๋งŒ ํ‰๊ฐ€๋จ. - ํ•ญ์› ์ข…๋ฅ˜(๋ฐ”์ด๋Ÿฌ์Šค, ์ข…์–‘ ํ•ญ์› ๋“ฑ) ๋ฐ ํ•ญ์ฒด ์„ธ๋ถ€ ํŠน์„ฑ(affinity, somatic mutation ์ˆ˜์ค€) ๋ณ„ ์„ฑ๋Šฅ ๋ถ„์„ ๋ถ€์žฌ. ํ›„์† ์—ฐ๊ตฌ: - ๋Œ€๊ทœ๋ชจ ๋ฐ์ดํ„ฐ์…‹๊ณผ ๊ตฌ์กฐ ๋ฏธ์ฃผ์„ ์„œ์—ด ํ†ตํ•ฉ. - Decoder ๊ตฌํ˜„ ๋ฐ ์กฐ๊ฑด๋ถ€ ์ƒ์„ฑ ํ‰๊ฐ€. - ๋ฏธ์†Œ ํ•ญ์›(novel epitope) ์ผ๋ฐ˜ํ™” ๋Šฅ๋ ฅ ํ–ฅ์ƒ.

Evaluation

Novelty: 4/5 Technical Soundness: 3/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

์ดํ‰: CALM์€ contrastive learning์œผ๋กœ ํ•ญ์ฒด-ํ•ญ์› sequence-to-specificity ์˜ˆ์ธก์˜ ๊ธฐ์ดˆ๋ฅผ ๊ตฌ์ถ•ํ•œ ์˜๋ฏธ ์žˆ๋Š” ์—ฐ๊ตฌ์ด๋‹ค. ์–‘๋ฐฉํ–ฅ ๊ฒ€์ƒ‰๊ณผ ์œ ์ถœ ํ†ต์ œ ํ‰๊ฐ€ ์„ค๊ณ„๊ฐ€ ๊ฒฌ๊ณ ํ•˜๋ฉฐ ISFM ๊ฐœ๋…์„ ๊ตฌ์ฒดํ™”ํ–ˆ์œผ๋‚˜, R@1 7%์˜ ๋‚ฎ์€ ์ ˆ๋Œ€ ์„ฑ๋Šฅ๊ณผ ๋ฏธ๊ตฌํ˜„ decoder, ์ œํ•œ๋œ ๋ฐ์ดํ„ฐ์…‹ ๊ทœ๋ชจ๋Š” ์‹ค๋ฌด ์ ์šฉ ์ „์— ํ•ด๊ฒฐ์ด ํ•„์š”ํ•˜๋‹ค. ๋ถ€ํ˜ธ๊ฐ€(Immune Specificity Foundation Model) ๊ฐœ๋ฐœ ๋ฐฉํ–ฅ ์ œ์‹œ๋กœ์„œ์˜ ๊ฐ€์น˜๋Š” ๋†’๋‹ค.

๊ฐ™์ด ๋ณด๋ฉด ์ข‹์€ ๋…ผ๋ฌธ

๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
์ƒ์ฒด๋ถ„์ž์™€ ์–ธ์–ด๋ชจ๋ธ์˜ ๋ฉ€ํ‹ฐ๋ชจ๋‹ฌ ์ž„๋ฒ ๋”ฉ ํ™œ์šฉ ์—ฐ๊ตฌ๋กœ, ์‹œํ€€์Šค-ํˆฌ-ํŠน์ด์„ฑ ๋ฌธ์ œ์— ๋Œ€ํ•œ ํ˜„๋Œ€์  ๋ชจ๋ธ ๊ตฌ์กฐ์˜ ๊ธฐ์ดˆ๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
344๋ฒˆ ๋…ผ๋ฌธ์€ ์ƒ๋ฌผ์ •๋ณดํ•™ ํŒŒ์šด๋ฐ์ด์…˜ ๋ชจ๋ธ ๋ฐ ํ•ญ์ฒด ์˜ˆ์ธก AI์˜ ๊ธฐ์ˆ ์  ํ† ๋Œ€๋ฅผ ์ •๋ฆฌํ•˜์—ฌ, CALM์˜ ์„ค๊ณ„ยท์‘์šฉ ์ดํ•ด์— ๋„์›€์ด ๋ฉ๋‹ˆ๋‹ค.
๊ธฐ๋ฐ˜ ์—ฐ๊ตฌ
ํ•ญ์ฒด-ํ•ญ์› ๊ฒฐํ•ฉ ์นœํ™”์„ฑ ์˜ˆ์ธก์—์„œ LLM ๊ธฐ๋ฐ˜ ๋ชจ๋ธ ์‚ฌ์šฉ ์‚ฌ๋ก€๋กœ, CALM์˜ contrastive learning ๋ฐฉ์‹ ๋Œ€ foundation LLM ๋น„๊ต ๊ธฐ๋ฐ˜์„ ์ œ๊ณตํ•œ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
3000๋ฒˆ ๋…ผ๋ฌธ์€ ๋ฆฌ์ŠคํŠธ ๋žญํ‚น ๊ธฐ๋ฐ˜ ํ•ญ์ฒด-ํ•ญ์› ์นœํ™”๋„ ์˜ˆ์ธก์œผ๋กœ, sequence-to-specificity ๋ฌธ์ œ์— ๋ฆฌ์ŠคํŠธ ๊ธฐ๋ฐ˜ ์ ‘๊ทผ๊ณผ contrastive approach์˜ ์ฐจ์ด๋ฅผ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
๋‹ค๋ฅธ ์ ‘๊ทผ
ํ•ญ์ฒด-ํ•ญ์› ํŠน์ด์„ฑ ์˜ˆ์ธก์— ์„œ์—ดยท๊ตฌ์กฐยทํ†ต๊ณ„ ๊ธฐ๋ฐ˜ ๋ชจ๋ธ (GVP, DCA ๋“ฑ) ๊ฐ„ ํŠน์ง• ๋ฐ ์„ฑ๋Šฅ์„ ๋น„๊ตํ•  ์ˆ˜ ์žˆ์Šต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ํ•ญ์ฒด-ํ•ญ์› ๊ฒฐํ•ฉ ํŠน์ด์„ฑ ์˜ˆ์ธก์„ ์œ„ํ•œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฐ˜ ๋ฐฉ๋ฒ•์œผ๋กœ, ๋‹จ๋ฐฑ์งˆ ์„œ์—ด ์„ค๊ณ„์˜ ์‹ค์ œ ์‘์šฉ์‚ฌ๋ก€๋ฅผ ์ œ์‹œํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
3062 ๋…ผ๋ฌธ์€ ๋Œ€์กฐํ•™์Šต์„ ๊ธฐ๋ฐ˜์œผ๋กœ ํ•ญ์ฒด-ํ•ญ์› ์ƒํ˜ธ์ž‘์šฉ๊นŒ์ง€ ๋ฒ”์œ„๋ฅผ ํ™•์žฅํ•˜์—ฌ DrugCLIP์˜ ๋ฐฉ๋ฒ•๋ก ์„ ์‹ฌํ™”์‹œํ‚ต๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
ํ•ญ์›-ํ•ญ์ฒด ์„œ์—ด-ํŠน์ด์„ฑ ์˜ˆ์ธก์— ํŠนํ™”๋œ ๋”ฅ๋Ÿฌ๋‹ ๊ธฐ๋ฒ•์„ ์†Œ๊ฐœํ•˜์—ฌ, ์›์ž์  ์ •ํ™•๋„ ํ•ญ์ฒด ์„ค๊ณ„์˜ ํ›„์† ๋ฐœ์ „ ์—ฐ๊ตฌ์— ์•„์ด๋””์–ด๋ฅผ ์ œ๊ณตํ•ฉ๋‹ˆ๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
Latent-Y๋Š” de novo ํ•ญ์ฒด ์„ค๊ณ„ ์—์ด์ „ํŠธ ์‹œ์Šคํ…œ์œผ๋กœ, CALM๊ณผ ๊ฐ™์€ sequence-to-specificity ๋ชจ๋ธ์ด ์‹คํ—˜ ํŒŒ์ดํ”„๋ผ์ธ์— ์ ์šฉ๋˜๋Š” ์‚ฌ๋ก€๋ฅผ ๋ณด์—ฌ์ค€๋‹ค.
ํ›„์† ์—ฐ๊ตฌ
3062 ๋…ผ๋ฌธ์€ ํ•ญ์ฒด-ํ•ญ์› ์ƒํ˜ธ์ž‘์šฉ๊นŒ์ง€ ์˜ˆ์ธก๋ฒ”์œ„๋ฅผ ํ™•์žฅํ•จ์œผ๋กœ์จ AbAffinity์˜ ๋‹จ๋ฐฑ์งˆ-ํ•ญ์ฒด ๊ฒฐํ•ฉ ์˜ˆ์ธก์„ ๋ณด์™„ ์‹ฌํ™”ํ•ฉ๋‹ˆ๋‹ค.
← ๋ชฉ๋ก์œผ๋กœ ๋Œ์•„๊ฐ€๊ธฐ

๐ŸŽง Audio Overview

์ด ๋…ผ๋ฌธ ๋ฆฌ๋ทฐ๋ฅผ ํŒŸ์บ์ŠคํŠธํ˜• ์˜ค๋””์˜ค๋กœ ์ƒ์„ฑํ•ฉ๋‹ˆ๋‹ค. (Gemini ยท ํ‚ค๋Š” ๋ธŒ๋ผ์šฐ์ €์—๋งŒ ์ €์žฅ ยท ์™„์„ฑ๋ณธ์€ ์ด๋ฉ”์ผ๋กœ๋„ ์ „์†ก)
โ–ธ ๊ณ ๊ธ‰: ๊ตฌ์„ฑ ๋ฐฉํ–ฅ(๋Œ€๋ณธ ์ž‘์„ฑ ์ง€์นจ) ์ง์ ‘ ์ˆ˜์ •