A speed-optimized/engineering-enhanced variant on the Wan2.1 T2V stack: it reduces text-to-video generation latency and increases throughput without noticeably sacrificing visual quality. This is typically achieved via engineering techniques such as attention sparsification, operator/layout optimizations, and caching.