Flexible Motion In-betweening with Diffusion Models

Essence

Figure 1: Flexible motion in-betweening given a text prompt and spatio-temporally sparse keyframes. From left to right:

CondMDI는 diffusion model 기반의 통합된 모션 인-비트위닝 방법으로, 텍스트 조건과 함께 유연한 keyframe 제약을 받아 다양하고 정밀한 인간 모션을 생성한다.

Known: RNN과 Transformer 기반 방법들이 motion in-betweening을 위해 제안되었으나, 고정된 keyframe 패턴으로 제한되거나 diffusion model 기반 방법들도 전체 관절 궤적 지정 시 foot sliding 등의 부자연스러운 움직임을 보인다.
Gap: 기존 diffusion 기반 방법들은 sparse temporal keyframe이나 부분적 pose 지정을 동시에 지원하거나, 다양한 keyframe 배치 패턴과 텍스트 조건을 유연하게 처리하는 통합 모델이 없다.
Why: Motion in-betweening은 character animation에서 노동집약적인 핵심 작업이며, 유연한 제약 조건 처리와 다양성이 동시에 요구되는 실무 친화적 솔루션이 필요하다.
Approach: 마스크된 조건부 diffusion model을 사용하여 무작위로 샘플링된 keyframe과 관절, 그리고 관찰된 keyframe과 특성을 나타내는 마스크로 학습함으로써 다양한 motion in-betweening 시나리오를 수용한다.

Figure 2: Conditional Motion Diffusion In-betweening (CondMDI) overview. The model is fed a noisy motion sequence x𝑡,

Figure 2: Conditional Motion Diffusion In-betweening (CondMDI) overview. The model is fed a noisy motion sequence x𝑡,

Novelty: 4/5 Technical Soundness: 4/5 Significance: 4/5 Clarity: 4/5 Overall: 4/5

총평: CondMDI는 masked conditional diffusion model을 통해 motion in-betweening의 오랜 한계를 효과적으로 해결하며, 유연한 제약 처리와 텍스트 조건의 통합으로 실무적 가치가 높고 기술적으로도 우수한 기여를 제시한다.