Based on the Wan2.1 family of video diffusion models, generation is guided by “control codes/conditional mappings,” belonging to the controllable video generation paradigm.
Supported control types (typical conditions for “video-driven image/video”):
- Edges/line art: Canny
- Pose: OpenPose (human keypoints/skeleton)
- Depth: Depth (monocular depth maps constraining spatial relations)
- Lines/geometry: MLSD (multi-line segment detection, often used for architectural/interior geometric constraints)
- Trajectory control: drive camera or local motion using keypoints/paths