Add `library_name: diffusers` and comprehensive usage instructions

This PR improves the model card for Wan-Alpha by:
- Adding the `library_name: diffusers` metadata tag. This enables the automated "how to use" widget on the model page, improving user experience and model discoverability. (This decision is based on a majority vote from colleague analyses, acknowledging potential evidence beyond the directly provided text snippets).
- Incorporating comprehensive "Quick Start" and "Official ComfyUI Version" instructions, including code snippets and model download links, directly from the GitHub repository's README. This significantly enhances the model's usability by providing immediate, detailed usage guidance on the Hugging Face Hub page, replacing the previous generic link to GitHub.
- Adding the paper title and abstract at the top of the model card for better context.
- Preserving existing relevant information such as project page, GitHub, and arXiv links, as well as the model's overall structure and visual elements.

Files changed (1) hide show

README.md +94 -3

README.md CHANGED Viewed

@@ -1,9 +1,19 @@
 ---
-license: apache-2.0
 base_model:
 - Wan-AI/Wan2.1-T2V-14B
 pipeline_tag: text-to-video
 ---
 <div align="center">
   <h1>
@@ -45,9 +55,90 @@ pipeline_tag: text-to-video
 ## 🚀 Quick Start
-Please see [Github](https://github.com/WeChatCV/Wan-Alpha) for code running details
 ## 🤝 Acknowledgements
@@ -81,4 +172,4 @@ If you find our work helpful for your research, please consider citing our paper
 ## 📬 Contact Us
-If you have any questions or suggestions, feel free to reach out via [GitHub Issues](https://github.com/WeChatCV/Wan-Alpha/issues) . We look forward to your feedback!

 ---
 base_model:
 - Wan-AI/Wan2.1-T2V-14B
+license: apache-2.0
 pipeline_tag: text-to-video
+library_name: diffusers
 ---
+# Wan-Alpha: High-Quality Text-to-Video Generation with Alpha Channel
+[Paper Link](https://huggingface.co/papers/2509.24979)
+## Abstract
+RGBA video generation, which includes an alpha channel to represent transparency, is gaining increasing attention across a wide range of applications. However, existing methods often neglect visual quality, limiting their practical usability. In this paper, we propose Wan-Alpha, a new framework that generates transparent videos by learning both RGB and alpha channels jointly. We design an effective variational autoencoder (VAE) that encodes the alpha channel into the RGB latent space. Then, to support the training of our diffusion transformer, we construct a high-quality and diverse RGBA video dataset. Compared with state-to-art methods, our model demonstrates superior performance in visual quality, motion realism, and transparency rendering. Notably, our model can generate a wide variety of semi-transparent objects, glowing effects, and fine-grained details such as hair strands. The released model is available on our website: this https URL .
 <div align="center">
   <h1>
 ## 🚀 Quick Start
+### 1. Environment Setup
+```bash
+# Clone the project repository
+git clone https://github.com/WeChatCV/Wan-Alpha.git
+cd Wan-Alpha
+# Create and activate Conda environment
+conda create -n Wan-Alpha python=3.11 -y
+conda activate Wan-Alpha
+# Install dependencies
+pip install -r requirements.txt
+```
+### 2. Model Download
+Download [Wan2.1-T2V-14B](https://huggingface.co/Wan-AI/Wan2.1-T2V-14B)
+Download [Lightx2v-T2V-14B](https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Lightx2v/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors)
+Download [Wan-Alpha VAE](https://huggingface.co/htdong/Wan-Alpha)
+### 🧪 Usage
+You can test our model through:
+```bash
+torchrun --nproc_per_node=8 --master_port=29501 generate_dora_lightx2v.py --size 832*480\
+         --ckpt_dir "path/to/your/Wan-2.1/Wan2.1-T2V-14B" \
+         --dit_fsdp --t5_fsdp --ulysses_size 8 \
+         --vae_lora_checkpoint "path/to/your/decoder.bin" \
+         --lora_path "path/to/your/epoch-13-1500.safetensors" \
+         --lightx2v_path "path/to/your/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors" \
+         --sample_guide_scale 1.0 \
+         --frame_num 81 \
+         --sample_steps 4 \
+         --lora_ratio 1.0 \
+         --lora_prefix "" \
+         --prompt_file ./data/prompt.txt \
+         --output_dir ./output
+```
+You can specify the weights of `Wan2.1-T2V-14B` with `--ckpt_dir`, `LightX2V-T2V-14B with` `--lightx2v_path`, `Wan-Alpha-VAE` with `--vae_lora_checkpoint`, and `Wan-Alpha-T2V` with `--lora_path`. Finally, you can find the rendered RGBA videos with a checkerboard background and PNG frames at `--output_dir`.
+**Prompt Writing Tip:** You need to specify that the background of the video is transparent, the visual style, the shot type (such as close-up, medium shot, wide shot, or extreme close-up), and a description of the main subject. Prompts support both Chinese and English input.
+```bash
+# An example of prompt.
+This video has a transparent background. Close-up shot. A colorful parrot flying. Realistic style.
+```
+## 🔨 Official ComfyUI Version
+Note: We have reorganized our models to ensure they can be easily loaded into ComfyUI. Please note that these models differ from the ones mentioned above.
+### 1. Download models
+- The Wan DiT base model: [wan2.1_t2v_14B_fp16.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/diffusion_models/wan2.1_t2v_14B_fp16.safetensors)
+- The Wan text encoder: [umt5_xxl_fp8_e4m3fn_scaled.safetensors](https://huggingface.co/Comfy-Org/Wan_2.1_ComfyUI_repackaged/blob/main/split_files/text_encoders/umt5_xxl_fp8_e4m3fn_scaled.safetensors)
+- The LightX2V model: [lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors](https://huggingface.co/Kijai/WanVideo_comfy/blob/main/Lightx2v/lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors)
+- Our RGBA Dora: [epoch-13-1500_changed.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/epoch-13-1500_changed.safetensors)
+- Our RGB VAE Decoder: [wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors)
+- Our Alpha VAE Decoder: [wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors](https://huggingface.co/htdong/Wan-Alpha_ComfyUI/blob/main/wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors)
+### 2. Copy the files into the `ComfyUI/models` folder and organize them as follows:
+```
+ComfyUI/models
+├── diffusion_models
+│   └── wan2.1_t2v_14B_fp16.safetensors
+���── loras
+│   ├── epoch-13-1500_changed.safetensors
+│   └── lightx2v_T2V_14B_cfg_step_distill_v2_lora_rank64_bf16.safetensors
+├── text_encoders
+│   └── umt5_xxl_fp8_e4m3fn_scaled.safetensors
+├── vae
+│   ├── wan_alpha_2.1_vae_alpha_channel.safetensors.safetensors
+│   └── wan_alpha_2.1_vae_rgb_channel.safetensors.safetensors
+```
+### 3. Install our custom RGBA video previewer and PNG frames zip packer. Copy the file [RGBA_save_tools.py](comfyui/RGBA_save_tools.py) into the `ComfyUI/custom_nodes` folder.
+- Thanks to @mr-lab for an improved WebP version! You can find it in this [issue](https://github.com/WeChatCV/Wan-Alpha/issues/4).
+### 4. Example workflow: [wan_alpha_t2v_14B.json](comfyui/wan_alpha_t2v_14B.json)
+<img src="comfyui/comfyui.jpg" style="margin:auto;"/>
+---
 ## 🤝 Acknowledgements
 ## 📬 Contact Us
+If you have any questions or suggestions, feel free to reach out via [GitHub Issues](https://github.com/WeChatCV/Wan-Alpha/issues) . We look forward to your feedback!