videoEffect.duration
videoEffect.resolution
videoEffect.ratio
Unlock the Future of AI Video Creation with Wan 2.2
Wan 2.2: Where Text Becomes Cinema – Empower Your Vision with AI-Powered Video Magic
What is Wan 2.2?
Wan 2.2 is an advanced open-source large-scale video generative model developed by Alibaba, marking the world's first open-source Mixture-of-Experts (MoE) architecture for video diffusion models. Released on July 28, 2025, it builds on the success of Wan 2.1 by enhancing video generation capabilities with cinematic-level aesthetics, complex motion handling, and efficient high-definition outputs. Available on platforms like Hugging Face, ModelScope, and integrated into tools such as ComfyUI and Diffusers, Wan 2.2 supports text-to-video (T2V), image-to-video (I2V), and hybrid text-image-to-video (TI2V) modes. It's designed for creators seeking high-quality, prompt-adherent videos at resolutions up to 720p@24fps, running efficiently even on consumer-grade GPUs like the RTX 4090.
What’s new in Wan 2.2
Mixture-of-Experts (MoE) Architecture
Wan 2.2 introduces the world's first open-source MoE for video diffusion models, using high-noise experts for initial layout and low-noise for detail refinement, with 27B parameters but only 14B active per step—boosting efficiency and quality over Wan 2.1's standard diffusion.
Benchmark Dominance and Integrations
Tops Wan-Bench 2.0, outperforming both open-source and closed models; seamlessly integrates with ComfyUI, Diffusers, and Hugging Face for easy use, including low-VRAM options and prompt extensions.
New Hybrid Model Variant (TI2V-5B)
A dense 5B model with high-compression Wan2.2-VAE, supporting hybrid text-to-video and image-to-video at 720p@24fps, generating 5-second videos in under 9 minutes on consumer GPUs like RTX 4090—making it more accessible than previous versions.
Expanded and Curated Training Data
Features +65.6% more images and +83.2% more videos than Wan 2.1, curated with labels for lighting, composition, contrast, and color, enabling cinematic-level aesthetics and superior prompt adherence.
Key Features of Wan 2.2
Scene-Setting Mastery: MoE Architecture for Dynamic Expertise
Wan 2.2 employs a Mixture-of-Experts (MoE) design with high-noise and low-noise experts, totaling 27B parameters but activating only 14B per step for efficiency. This allows superior handling of complex motions and semantics, outperforming traditional models in fluidity and detail.
Director's Cut: Cinematic Aesthetics and Prompt Precision
Curated with detailed labels for lighting, composition, contrast, and color, Wan 2.2 delivers movie-grade visuals. It excels in prompt adherence, generating natural animations without excessive hallucinations, perfect for precise creative control.
Action-Packed Frames: Enhanced Motion and Resolution Support
With +65.6% more images and +83.2% more videos in training data compared to Wan 2.1, Wan 2.2 reduces frame flickering and supports 720p@24fps videos up to 5 seconds long. The TI2V-5B variant enables fast generation on budget hardware.
Special Effects Reel: Multimodal Versatility
Seamless integration of text, images, and video, including image-to-video transitions and style consistency across modes. Features like particle systems, lighting effects, and LoRA training optimizations make it ideal for diverse applications.
Wan 2.2 vs Wan 2.1 vs Other Video Models
| Feature | Wan 2.2 | Wan 2.1 | Kling AI (1.5/2.0) | OpenAI Sora | Luma AI Dream Machine |
|---|---|---|---|---|---|
| Architecture | Mixture-of-Experts (MoE) with high/low-noise experts; first open-source MoE for video diffusion | Standard diffusion model; no MoE | Proprietary transformer-based; focuses on temporal consistency | Proprietary diffusion with advanced transformer; emphasis on world simulation | Diffusion-based with emphasis on surreal and dynamic effects |
| Parameters | 27B total (14B active per step); 5B hybrid variant | ~11B (estimated; less efficient scaling) | Not disclosed (proprietary; likely 10B+) | Not disclosed (proprietary; rumored 10B+) | Not disclosed (proprietary; mid-range) |
| Max Resolution/FPS | 720p@24fps (native 1080p in some previews); up to 5s videos | 480p/720p@ lower FPS; shorter clips with more artifacts | 1080p@30fps; up to 2min videos | 1080p@ variable FPS; up to 1min (based on demos) | 720p@ variable FPS; up to 10s clips |
| Open-Source | Yes (MIT license; available on Hugging Face/ModelScope) | Yes (MIT license) | No (commercial; API access via Kuaishou) | No (closed; limited access via OpenAI) | No (commercial; web/app-based) |
| Key Strengths | Superior prompt adherence, cinematic aesthetics, efficient on consumer GPUs; outperforms on Wan-Bench 2.0 | Good baseline quality; accessible for open-source users | Excellent motion fluidity and extensions; competitive with Sora | Realistic physics and long-form coherence; high creative potential | Surreal, artistic outputs; fast generation for short clips |
| Weaknesses | Requires 80GB+ VRAM for full models; community optimization needed for speed | Higher artifacts, less motion consistency; smaller data leads to hallucinations | Proprietary limits customization; higher costs for API | Not publicly available; ethical concerns with training data | Inconsistent realism; prone to warping in complex scenes |
| Hardware Requirements | RTX 4090 for 5B model (~9min/clip); multi-GPU for larger variants | Similar but less optimized; higher VRAM needs for quality | Cloud-based; no local run | Cloud-based; no local access | Cloud-based; no local access |
| Benchmark Performance | Tops Wan-Bench 2.0; better convergence and loss than 2.1 | Solid but outperformed by 2.2; good in open-source category | Strong in user tests vs. Sora/Luma; excels in temporal metrics | Leading in creative benchmarks (demos show superiority in coherence) | High in qualitative demos; no public benchmarks |
How to Use Wan 2.2
Install Dependencies:
Clone the GitHub repo (git clone https://github.com/Wan-Video/Wan2.2.git) and run pip install -r requirements.txt (PyTorch >= 2.4.0 required).
Download Models:
Use Hugging Face CLI for T2V-A14B, I2V-A14B, or TI2V-5B (e.g., huggingface-cli download Wan-AI/Wan2.2-T2V-A14B).
Generate Videos:
For T2V: python generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --prompt "Your detailed prompt". Optimize with --offload_model True for memory efficiency.
FAQs of Wan 2.2
What resolutions does Wan 2.2 support?
Wan 2.2 supports 480p and 720p at 24fps, with the TI2V-5B model optimized for 1280x704 or 704x1280.
Is Wan 2.2 free to use?
Yes, it's open-source under MIT license, available on Hugging Face.
How does Wan 2.2 handle hardware requirements?
The 5B model runs on RTX 4090 in under 9 minutes for 720p videos, making it accessible for non-enterprise users.
Can I fine-tune Wan 2.2 with LoRA?
While not explicitly detailed in the release, its architecture supports style training, with community integrations emerging.
Where can I test Wan 2.2 demos?
Explore demos on Hugging Face spaces or use ComfyUI for interactive testing and experimentation.
What types of video generation does Wan 2.2 support?
Wan 2.2 supports text-to-video (T2V), image-to-video (I2V), and hybrid text-image-to-video (TI2V) modes, offering flexibility for diverse creative projects.
How does Wan 2.2 improve prompt adherence?
Its curated training data and MoE architecture ensure high fidelity to text and image prompts, producing videos with accurate details and minimal errors.
Is multi-GPU support available for Wan 2.2?
Yes, Wan 2.2 supports multi-GPU configurations, which can significantly speed up video generation for larger projects.