videoEffect.duration

videoEffect.resolution

videoEffect.ratio

videoEffect.autoSound
videoEffect.autoSpeech
videoEffect.noWatermark
videoEffect.private

Unlock the Future of AI Video Creation with Wan 2.2

Wan 2.2: Where Text Becomes Cinema – Empower Your Vision with AI-Powered Video Magic

What is Wan 2.2?

Wan 2.2 is an advanced open-source large-scale video generative model developed by Alibaba, marking the world's first open-source Mixture-of-Experts (MoE) architecture for video diffusion models. Released on July 28, 2025, it builds on the success of Wan 2.1 by enhancing video generation capabilities with cinematic-level aesthetics, complex motion handling, and efficient high-definition outputs. Available on platforms like Hugging Face, ModelScope, and integrated into tools such as ComfyUI and Diffusers, Wan 2.2 supports text-to-video (T2V), image-to-video (I2V), and hybrid text-image-to-video (TI2V) modes. It's designed for creators seeking high-quality, prompt-adherent videos at resolutions up to 720p@24fps, running efficiently even on consumer-grade GPUs like the RTX 4090.

What’s new in Wan 2.2

  • Mixture-of-Experts (MoE) Architecture

    Wan 2.2 introduces the world's first open-source MoE for video diffusion models, using high-noise experts for initial layout and low-noise for detail refinement, with 27B parameters but only 14B active per step—boosting efficiency and quality over Wan 2.1's standard diffusion.

  • Benchmark Dominance and Integrations

    Tops Wan-Bench 2.0, outperforming both open-source and closed models; seamlessly integrates with ComfyUI, Diffusers, and Hugging Face for easy use, including low-VRAM options and prompt extensions.

  • New Hybrid Model Variant (TI2V-5B)

    A dense 5B model with high-compression Wan2.2-VAE, supporting hybrid text-to-video and image-to-video at 720p@24fps, generating 5-second videos in under 9 minutes on consumer GPUs like RTX 4090—making it more accessible than previous versions.

  • Expanded and Curated Training Data

    Features +65.6% more images and +83.2% more videos than Wan 2.1, curated with labels for lighting, composition, contrast, and color, enabling cinematic-level aesthetics and superior prompt adherence.

Key Features of Wan 2.2

Scene-Setting Mastery: MoE Architecture for Dynamic Expertise

Wan 2.2 employs a Mixture-of-Experts (MoE) design with high-noise and low-noise experts, totaling 27B parameters but activating only 14B per step for efficiency. This allows superior handling of complex motions and semantics, outperforming traditional models in fluidity and detail.

Director's Cut: Cinematic Aesthetics and Prompt Precision

Curated with detailed labels for lighting, composition, contrast, and color, Wan 2.2 delivers movie-grade visuals. It excels in prompt adherence, generating natural animations without excessive hallucinations, perfect for precise creative control.

Action-Packed Frames: Enhanced Motion and Resolution Support

With +65.6% more images and +83.2% more videos in training data compared to Wan 2.1, Wan 2.2 reduces frame flickering and supports 720p@24fps videos up to 5 seconds long. The TI2V-5B variant enables fast generation on budget hardware.

Special Effects Reel: Multimodal Versatility

Seamless integration of text, images, and video, including image-to-video transitions and style consistency across modes. Features like particle systems, lighting effects, and LoRA training optimizations make it ideal for diverse applications.

Wan 2.2 vs Wan 2.1 vs Other Video Models

FeatureWan 2.2Wan 2.1Kling AI (1.5/2.0)OpenAI SoraLuma AI Dream Machine
ArchitectureMixture-of-Experts (MoE) with high/low-noise experts; first open-source MoE for video diffusionStandard diffusion model; no MoEProprietary transformer-based; focuses on temporal consistencyProprietary diffusion with advanced transformer; emphasis on world simulationDiffusion-based with emphasis on surreal and dynamic effects
Parameters27B total (14B active per step); 5B hybrid variant~11B (estimated; less efficient scaling)Not disclosed (proprietary; likely 10B+)Not disclosed (proprietary; rumored 10B+)Not disclosed (proprietary; mid-range)
Max Resolution/FPS720p@24fps (native 1080p in some previews); up to 5s videos480p/720p@ lower FPS; shorter clips with more artifacts1080p@30fps; up to 2min videos1080p@ variable FPS; up to 1min (based on demos)720p@ variable FPS; up to 10s clips
Open-SourceYes (MIT license; available on Hugging Face/ModelScope)Yes (MIT license)No (commercial; API access via Kuaishou)No (closed; limited access via OpenAI)No (commercial; web/app-based)
Key StrengthsSuperior prompt adherence, cinematic aesthetics, efficient on consumer GPUs; outperforms on Wan-Bench 2.0Good baseline quality; accessible for open-source usersExcellent motion fluidity and extensions; competitive with SoraRealistic physics and long-form coherence; high creative potentialSurreal, artistic outputs; fast generation for short clips
WeaknessesRequires 80GB+ VRAM for full models; community optimization needed for speedHigher artifacts, less motion consistency; smaller data leads to hallucinationsProprietary limits customization; higher costs for APINot publicly available; ethical concerns with training dataInconsistent realism; prone to warping in complex scenes
Hardware RequirementsRTX 4090 for 5B model (~9min/clip); multi-GPU for larger variantsSimilar but less optimized; higher VRAM needs for qualityCloud-based; no local runCloud-based; no local accessCloud-based; no local access
Benchmark PerformanceTops Wan-Bench 2.0; better convergence and loss than 2.1Solid but outperformed by 2.2; good in open-source categoryStrong in user tests vs. Sora/Luma; excels in temporal metricsLeading in creative benchmarks (demos show superiority in coherence)High in qualitative demos; no public benchmarks

How to Use Wan 2.2

  • Install Dependencies:

    Clone the GitHub repo (git clone https://github.com/Wan-Video/Wan2.2.git) and run pip install -r requirements.txt (PyTorch >= 2.4.0 required).

  • Download Models:

    Use Hugging Face CLI for T2V-A14B, I2V-A14B, or TI2V-5B (e.g., huggingface-cli download Wan-AI/Wan2.2-T2V-A14B).

  • Generate Videos:

    For T2V: python generate.py --task t2v-A14B --size 1280*720 --ckpt_dir ./Wan2.2-T2V-A14B --prompt "Your detailed prompt". Optimize with --offload_model True for memory efficiency.

FAQs of Wan 2.2

  • What resolutions does Wan 2.2 support?

    Wan 2.2 supports 480p and 720p at 24fps, with the TI2V-5B model optimized for 1280x704 or 704x1280.

  • Is Wan 2.2 free to use?

    Yes, it's open-source under MIT license, available on Hugging Face.

  • How does Wan 2.2 handle hardware requirements?

    The 5B model runs on RTX 4090 in under 9 minutes for 720p videos, making it accessible for non-enterprise users.

  • Can I fine-tune Wan 2.2 with LoRA?

    While not explicitly detailed in the release, its architecture supports style training, with community integrations emerging.

  • Where can I test Wan 2.2 demos?

    Explore demos on Hugging Face spaces or use ComfyUI for interactive testing and experimentation.

  • What types of video generation does Wan 2.2 support?

    Wan 2.2 supports text-to-video (T2V), image-to-video (I2V), and hybrid text-image-to-video (TI2V) modes, offering flexibility for diverse creative projects.

  • How does Wan 2.2 improve prompt adherence?

    Its curated training data and MoE architecture ensure high fidelity to text and image prompts, producing videos with accurate details and minimal errors.

  • Is multi-GPU support available for Wan 2.2?

    Yes, Wan 2.2 supports multi-GPU configurations, which can significantly speed up video generation for larger projects.