wan_fantasytalking: an audio‑driven video generation model for lip‑synced digital humans. Given a single portrait image plus an audio clip, it produces a high‑fidelity talking video with strict lip synchronization and natural head motion and ?????? expressions, emphasizing identity consistency and temporal coherence.

Input/Output: single portrait + audio → talking video; focuses on three aspects: lip‑sync accuracy, identity preservation, and natural motion/expressions.

Lip‑sync and temporal modeling: uses audio features (e.g., speech, phonemes, visemes) to drive the mouth and ?????? regions, jointly coupling head motion and expressions to avoid the “lips‑only” uncanny effect.

WAN2.1 幻想谈话-音频驱动程序-KJ

评分与评论