Wan2.1‑Fun‑InP is part of the Wan2.1‑Fun series. It supports conditioning on a start frame and an end frame to infer the in‑between transition and generate temporally coherent videos, targeting controllable image‑to‑video use cases.
Problem it solves:
Traditional I2V only extends the timeline from a single starting image. By introducing a terminal keyframe, Fun‑InP guides global motion, composition, and semantics to converge toward a specified goal, making transitions more controllable and narratives stronger.
Inputs: start‑frame image, end‑frame image (optional text prompt/control signals).
Output: a video clip composed of interpolated middle frames, with the first and last frames aligned in appearance and semantics to the given keyframes.