ATI is ByteDance’s controllable video generation framework that injects user‑drawn motion/camera paths into a diffusion model via “any trajectory instructions,” enabling precise control over object motion and camera movement.
ATI is base‑model agnostic and can adapt to various I2V models. Users draw a dedicated “camera trajectory” on the input image (e.g., dolly in/out, pan/translate, rotate, tilt); ATI encodes it as motion vectors in latent space and injects it alongside object trajectories to produce continuous videos with explicit cinematographic camera behavior.
ATI supports multiple trajectories in parallel (the paper/project page demonstrates up to eight independent trajectories). Camera and object trajectories can act simultaneously, enabling composite moves such as follow shots, orbits, and dolly‑zoom.