日本語版はこちら

I was grateful to receive early access to Wan2.6 from SeaArt and the Wan team before its official release, so I have been putting it to the test. In this article, I will provide an early review of Wan2.6. While I received early access, I have not received a single dollar in promotional compensation, so I will be providing an honest review once again.

0. Overview

Wan2.6 is a video generation model developed by the Wan team of the Alibaba Group and is the successor to the Wan2.5 model. Considering that the official release of Wan2.5 was at the end of September, the development speed of Wan2.6 is astonishing. Following in the footsteps of Veo3.1, it has a modest numbering update, but how does it actually perform?

1. Looking Back at Wan2.5

Hoping I don't get in trouble with Alibaba, let's take a quick look back at the previous model. Wan2.5 was a model that shared a similar concept with Veo3, essentially establishing itself as a "low-cost Veo3." It is still fresh in our minds how it broke Veo3's monopoly as one of the few models to support native integrated audio generation. While several models now support integrated audio generation, Wan's strength lies in its accessibility. For example, Sora2 supports audio integration, but access to Sora2 is limited. Veo3 is available on many platforms, but the cost is very high. In that regard, Wan2.5 is offered cheaply on many platforms, and there are even ways to use it for free. The fact that a model capable of competing with Veo3 is so accessible is one of the major values of Wan2.5.

2. Evolution Points of Wan2.6

Again, hoping I don't get in trouble for writing this, to be honest, Wan2.6 cannot be called a revolutionary evolution. It doesn't bring any entirely new functions to the industry, as the numbering suggests. However, its performance has evolved steadily, and some interesting features have been added—~~though they might look familiar.~~

First, it is great that Wan2.6 now supports videos up to 1080p and 15 seconds for a single task. Wan2.5 was limited to 10 seconds, so this is a welcome change. Additionally, image quality, prompt following, object consistency, and sound quality have all seen steady improvements. A standout point is that Wan2.6 has introduced a feature called "Starring." This allows pre-registered characters to appear in generated videos, which is very similar to the "Cameo" feature found in OpenAI's Sora2. In the next section, let's compare the generated videos I actually tested.

3. Comparing Actual Quality

What I felt through actual comparison was that, surprisingly, Wan2.5 had many noticeable glitches (failures in physical consistency). I don't know if it's because my eyes have become too accustomed to high-performance models over the last few months or if it's an Alibaba conspiracy, but Wan2.6 has a significantly lower probability of glitching compared to Wan2.5. Furthermore, Wan2.6 has vastly improved the representation of light reflections. This feels similar to the sensation I had when I first tried Veo3. Of course, if you provide an "initial frame" with Image-to-Video, the video model will inherit it, so it is possible to generate videos with beautiful light reflections even in Wan2.5. However, what makes Wan2.6 impressive is its ability to boast excellent image quality even in Text-to-Video.

Let's look at some examples. As a note, since I couldn't generate videos with integrated audio at 1080p in Wan2.6 no matter what I tried, this comparison was conducted at 720p. This is likely an issue with the preview version and should be fine in the official release. Unless otherwise noted, the videos below are arranged in the order of Wan2.5, then Wan2.6.

Update: The model update on December 12th resolved this issue. It is now possible to generate videos with audio at 1080p.

In the FPV drone perspective video, the glitches are very apparent. Please ignore the difference in drone speed as that is an issue of the Seed randomizer. Unfortunately, Wan2.6 ignored the "FPV" instruction, but there are no other glitches. Wan2.5 shows drones appearing and disappearing, along with other minor glitches. The way the drone appears from the void and vanishes back into it suggests the model is hesitating during sampling on whether or not to show the drone.

The ASMR video of slowly cutting glass fruit, which was popular with Veo3, is a total victory for Wan2.6. Putting aside the fact that the content itself is physically impossible, Wan2.5 is clearly broken. An apple that remains unscarred after being stabbed by a knife makes water splashing sounds, and it completely ignores the "slowly" prompt. Wan2.6 output a sound different from what I expected, but in that case, Veo3 was actually wrong, and Wan2.6 could be considered physically correct.

I also tried another ASMR concept. This was also a complete victory for Wan2.6. Even before considering sound quality, Wan2.5 fails as a visual. Wan2.6, which shows no major glitches even in such unique footage, is truly showing steady evolution.

I tried the "impossible task" of testing Japanese pronunciation. It seems this was a bit too mean; Wan2.5 is in a pretty terrible state. Wan2.6 is relatively better, but it still cannot be said to support Japanese narration.

Next, I had it visualize a supernova explosion. Here, I checked the dynamic range and the model's interpretation. While sounds aren't heard in an actual supernova, I wanted to see how the Wan video models would handle this. Wan2.5 is steady in a way; while generating visuals like a supernova, it likely concluded that there would be no explosion sound. Wan2.6 seems to interpret "supernova" as "explosion," and a celestial body that looks like a white dwarf is exploding with sound. Could this be considered a glitch? Regarding dynamic range, both models seem to have no issues.

The classic benchmark of dropping ink into water and watching it diffuse is an overwhelming win for Wan2.6. Or rather, it might be an utter defeat for Wan2.5. Adding green color not present in the prompt is an issue, but Wan2.5 also fails to accurately depict fluid motion.

Since I love cars, I often use car themes for video model comparisons, and Wan2.6 offers a more impressive perspective for a cut of this length. There are things to nitpick regarding the driving physics, but in a video with this level of movement, Wan2.5's glitches are not noticeable. However, the sounds are quite different, and I can confidently say Wan2.6 has better sound quality. And for the third video, Veo3.1 is visually overwhelming.

Regarding Image-to-Video with little movement, Wan2.6 is slightly superior. This isn't to say Wan2.5's quality is bad, but Wan2.6 interprets the input more skillfully to create a video.

4. About the Starring Feature

One of the newly added features in Wan2.6 is the "Starring" feature. This is similar to Sora2's "Cameo" feature, allowing pre-registered characters to appear in generated videos. The biggest difference from Sora2 is whether an API is provided; while Sora2 does not provide an API including Cameo, Wan2.6 does. The Wan team also claims that it can generate at higher resolutions than Sora and is intended for professional use.

Roughly speaking, the user experience of the Starring feature is the same as Sora2's Cameo. There are some minor differences and tips for usage that I will explain.

A major difference is that, unlike Sora2, Wan2.6 allows you to specify the duration of the generated video as 5s, 10s, or 15s when using the Starring feature. You can also select aspect ratios of 16:9, 4:3, 1:1, 3:4, and 9:16, so the range of output is clearly wider.

Additionally, since you cannot check what kind of input was used to register the characters, you should write prompts in detail. The characters registered by default in Wan2.6 are likely registered in Chinese, so if you want a character to speak, they may start speaking in Chinese unless you clearly specify the dialogue. This can happen even if you write the prompt in English.

While using it, I strongly feel the strictness of Sora2's censorship. Sora2 blocks clearly SFW (Safe For Work) input prompts with "unreasonable" censorship. Not being able to generate a video is one thing, but Sora rarely even blocks character creation. This might be because the character I input resembles a celebrity I don't know. Granted, this feature carries risks, so a certain level of over-strict censorship can be considered healthy. I haven't tried it because I'd hate to get caught and scolded, but Wan's censorship is more relaxed compared to Sora2, so you might be able to create deepfakes with Wan. To the Wan team, if you are reading this, please optimize the censorship.

5. Image Generation

Wan2.6 also supports image generation. This has been supported since previous model series, but the image generation quality in Wan2.6 has improved. The first image is from Wan2.5, and the second is from Wan2.6.

Despite being given the exact same prompt, Wan2.6 generated a more beautiful image. While Wan2.5 tries to be faithful to the prompt and generate realistic images, Wan2.6 prioritizes visual beauty in terms of color usage and contrast.

Naturally for a model of this class, Wan2.6 also supports image editing. The quality here is reasonably high, perhaps on par with Nano banana. However, the quality of image editing depends more on the interpretation of the input than the generative capability of the model, so what is considered "good" depends to some extent on personal preference. What is considered a bad result in image editing is a "misinterpretation." The biggest problem is that the human side cannot systematize that interpretation. Therefore, I will avoid a detailed evaluation here, but I can say that Wan2.6's image editing has reached a certain standard. To mention a minor complaint about the specifications, Wan2.6's image editing fixes the batch size to 4, just like regular image generation. For image editing, a batch size of 1 would suffice, so this feels like a waste of computing power, which doesn't feel great.

6. Summary

Wan2.6 has seen a steady improvement in video generation performance from Wan2.5. Physical glitches have decreased significantly, and overall image quality has improved. Conceptually, Wan2.6 "greedily" adopts the strengths of other famous models in the market. I previously described Wan2.5 as a "cheap Veo3," but Wan2.6 might be a "cheap Veo3 + cheap Sora2 + cheap Nano banana." It is impressive that they achieved these output improvements and functional additions in just a few months since the release of Wan2.5.

Finally, I would like to thank the Wan team and SeaArt for giving me this opportunity. Thank you for reading until the end.

【Advance Review】 Wan2.6