This is a text-to-video workflow based on LongCat-Video, a unified video generation model with 13.6B parameters. It natively supports text-to-video, image-to-video, and video continuation, emphasizing stability and efficient inference for long videos. Through a spatiotemporal coarse-to-fine generation strategy and Block Sparse Attention, it achieves high efficiency in 720p, 30fps scenarios, and combines multi-reward GRPO with RLHF to improve alignment and quality.