Wan2.1-MAGREF is an image-to-video (I2V) model variant built on the Wan2.1 foundation, enhanced with MAGREF (Masked Guidance for Any-Reference Video Generation) technology. The model supports video generation driven by one or multiple reference images, with a focus on maintaining subject identity and stable facial/body features throughout the generated animation.
MAGREF introduces region-aware dynamic masks and encodes multiple reference images, enabling effective integration of information from different viewpoints or different characters. The model uses high-quality data filtering during training, with targeted handling of subtitles, main objects, and faces to improve output stability.