Proteo-R1: Reasoning Foundation Models for De Novo Protein Design

Motivation

Known: 기존 diffusion 기반 단백질 설계 모델들(ProteinMPNN, RFDiffusion 등)은 구조 생성을 직접 수행하지만 설계 의도가 diffusion 과정 내에 암묵적으로 인코딩되어 해석가능성과 제어성이 제한된다는 점은 잘 알려져 있다. 인간 단백질 공학자는 먼저 기능적으로 중요한 잔기를 식별한 후 기하학적 최적화를 수행하는 방식을 취한다.
Gap: 기존 생성 모델들은 모든 잔기를 균일하게 다루며 추론과 생성이 얽혀 있어, 설계 논리의 재사용성과 제어가능성이 떨어진다. 또한 textual guidance를 continuous dynamics에 직접 주입하는 방식은 안정성과 해석가능성 측면에서 문제가 있다. LLM의 추론능력과 diffusion 모델의 기하학적 생성능력을 명시적으로 분리하여 통합하는 프레임워크가 필요하다.
Why: 단백질 설계는 구조적 안정성과 기능적 효율성을 동시에 만족해야 하는데, 기존 black-box 생성 모델은 설계 결정의 생화학적 근거를 명시하지 않는다. 따라서 결과의 신뢰성 평가와 재설계가 어렵다. 추론 단계를 분리함으로써 과학적 해석가능성을 높이고 도메인 지식을 체계적으로 활용할 수 있다.
Approach: Proteo-R1은 두 가지 전문 모듈로 구성된다: (1) Understanding Expert: multimodal LLM이 sequence encoding(ESM-2), structure encoding(AF3 style), textual context를 통합하여 functional residue를 식별하고 biochemical identity를 결정한다. (2) Generation Expert: AF3-like diffusion model이 understanding expert의 residue-level constraint를 hard constraint로 받아 조건부 코디자인을 수행한다. 두 모듈 간 정보 흐름은 residue embedding space에서의 명시적 injection을 통해 이루어진다. 3단계 curriculum learning으로 cross-modal grounding, geometric reasoning, end-to-end design을 순차적으로 안정화한다.

How

Figure 2. Three-stage training diagram of Proteo-R1. In Stage I (Multimodal Alignment), the framework uses general prote

Understanding Expert: 마스크된 CDR region을 가진 복합체를 입력으로, ESM-2를 통한 sequence embedding, AF3 style structural representation, 텍스트 기반 생화학 맥락(antigen hotspot, design intent 등)을 multimodal alignment으로 통합
Key Residue Identification: LLM이 salt bridge hotspot, specificity-determining motif 등 기능적으로 중요한 잔기 위치와 선호 biochemical identity 결정
Residue Embedding Injection: understanding expert의 hidden representation을 projection layer를 통해 diffusion model의 residue embedding space로 변환하고, key residue position의 standard <X> embedding을 명시적으로 대체
Conditional Codesign: generation expert의 diffusion process에서 고정된 residue constraint를 유지하면서 나머지 자유도 최적화
3단계 학습: Stage I (Multimodal Alignment, general protein PDB), Stage II (Mid-Training, antibody-specific data SAbDab), Stage III (Joint Training, end-to-end optimization)