nielsr HF Staff commited on
Commit
0e8296e
·
verified ·
1 Parent(s): 9984cfb

Improve model card for VORTA: Add paper/code links, abstract, and detailed usage

Browse files

This PR significantly enhances the model card for VORTA by adding:
- An explicit link to the official Hugging Face paper page.
- A direct link to the GitHub repository for easy access to the code.
- The comprehensive abstract of the paper, providing a detailed overview of the research.
- Detailed installation instructions and a "Sample Usage (Inference)" section with code snippets for running inference, directly adapted from the official GitHub repository. This makes it easier for users to get started.
- Acknowledgements and a BibTeX citation, ensuring proper attribution.

These additions make the model card more informative and user-friendly, providing a more complete documentation of the artifact.

Files changed (1) hide show
  1. README.md +49 -6
README.md CHANGED
@@ -1,27 +1,70 @@
1
  ---
2
- license: mit
3
  base_model:
4
  - Wan-AI/Wan2.1-T2V-14B-Diffusers
5
  - hunyuanvideo-community/HunyuanVideo
6
- pipeline_tag: text-to-video
7
  library_name: diffusers
 
 
8
  ---
9
 
10
  # VORTA: Efficient Video Diffusion via Routing Sparse Attention
11
 
 
 
12
  > TL;DR - VORTA accelerates video diffusion transformers by sparse attention and dynamic routing, achieving speedup with negligible quality loss.
13
 
 
 
14
 
15
- ## Quick Start
 
 
 
 
16
 
17
- 1. Download the checkpoints into the `./results` directory under the VORTA GitHub code repository.
 
 
 
 
 
 
 
 
 
 
18
  ```bash
19
  git lfs install
20
  git clone [email protected]:anonymous728/VORTA
21
  # mv VORTA/<model_name> results/, <model_name>: wan-14B, hunyuan; e.g.
22
  mv VORTA/wan-14B results/
23
  ```
24
- _Other alternative methods to download the models can be found [here](https://huggingface.co/docs/hub/models-downloading#using-git)._
25
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
26
 
27
- 2. Follow the `README.md` instructions to run the sampling with speedup. 🤗
 
 
 
 
 
 
 
 
 
 
1
  ---
 
2
  base_model:
3
  - Wan-AI/Wan2.1-T2V-14B-Diffusers
4
  - hunyuanvideo-community/HunyuanVideo
 
5
  library_name: diffusers
6
+ license: mit
7
+ pipeline_tag: text-to-video
8
  ---
9
 
10
  # VORTA: Efficient Video Diffusion via Routing Sparse Attention
11
 
12
+ [\ud83d\udcda Paper](https://huggingface.co/papers/2505.18809) | [\ud83d\udcbb Code](https://github.com/wenhao728/VORTA)
13
+
14
  > TL;DR - VORTA accelerates video diffusion transformers by sparse attention and dynamic routing, achieving speedup with negligible quality loss.
15
 
16
+ ## Abstract
17
+ Video diffusion transformers have achieved remarkable progress in high-quality video generation, but remain computationally expensive due to the quadratic complexity of attention over high-dimensional video sequences. Recent acceleration methods enhance the efficiency by exploiting the local sparsity of attention scores; yet they often struggle with accelerating the long-range computation. To address this problem, we propose VORTA, an acceleration framework with two novel components: 1) a sparse attention mechanism that efficiently captures long-range dependencies, and 2) a routing strategy that adaptively replaces full 3D attention with specialized sparse attention variants. VORTA achieves an end-to-end speedup $1.76\times$ without loss of quality on VBench. Furthermore, it can seamlessly integrate with various other acceleration methods, such as model caching and step distillation, reaching up to speedup $14.41\times$ with negligible performance degradation. VORTA demonstrates its efficiency and enhances the practicality of video diffusion transformers in real-world settings.
18
 
19
+ ## Installation
20
+ Install Pytorch. We have tested the code with PyTorch 2.6.0 and CUDA 12.6, but it should work with other versions as well. You can install PyTorch using the following command:
21
+ ```
22
+ pip install torch==2.6.0 torchvision==0.21.0 --index-url https://download.pytorch.org/whl/cu126
23
+ ```
24
 
25
+ Install the dependencies:
26
+ ```
27
+ python -m pip install -r requirements.txt
28
+ ```
29
+
30
+ ## Sample Usage (Inference)
31
+ We use the general scripts to demonstrate the usage of our method. You can find the detailed scripts for each model in the `scripts` folder of the [VORTA GitHub repository](https://github.com/wenhao728/VORTA):
32
+ - HunyuanVideo: `scripts/hunyuan/inference.sh`
33
+ - Wan 2.1: `scripts/wan/inference.sh`
34
+
35
+ First, download the ready-to-use router weights. Assuming this repository is cloned as `VORTA` from the GitHub repository:
36
  ```bash
37
  git lfs install
38
  git clone [email protected]:anonymous728/VORTA
39
  # mv VORTA/<model_name> results/, <model_name>: wan-14B, hunyuan; e.g.
40
  mv VORTA/wan-14B results/
41
  ```
 
42
 
43
+ Run the video DiTs with VORTA for acceleration (example for `wan` model):
44
+ ```bash
45
+ CUDA_VISIBLE_DEVICES=0 python scripts/wan/inference.py \
46
+ --pretrained_model_path Wan-AI/Wan2.1-T2V-14B-Diffusers \
47
+ --val_data_json_file prompt.json \
48
+ --output_dir results/wan-14B/vorta \
49
+ --resume_dir results/wan-14B/train \
50
+ --resume ckpt/step-000100 \
51
+ --enable_cpu_offload \
52
+ --seed 1234
53
+ ```
54
+ For the `hunyuan` model, replace `wan` with `hunyuan` in the script path and output directory, and use `hunyuanvideo-community/HunyuanVideo` as the `--pretrained_model_path`.
55
+
56
+ You can edit the `prompts.json` or the `--val_data_json_file` option to change the text prompt. See the source code `scripts/<model_name>/inference.py` or use `python scripts/<model_name>/inference.py --help` command for more detailed explanations of the arguments.
57
+
58
+ ## Acknowledgements
59
+ Thanks to the authors of the following repositories for their great works and open-sourcing the code and models: [Diffusers](https://github.com/huggingface/diffusers), [HunyuanVideo](https://github.com/Tencent/HunyuanVideo), [Wan 2.1](https://github.com/Wan-Video/Wan2.1), [FastVideo](https://github.com/hao-ai-lab/FastVideo)
60
 
61
+ ## Citation
62
+ If you find our work helpful or inspiring, please feel free to cite it.
63
+ ```bibtex
64
+ @article{wenhao728_2025_vorta,
65
+ author = {Wenhao and Li, Wenhao and Wang, Yanan and Zhao, Jizhao and Zheng, Wei},
66
+ title = {VORTA: Efficient Video Diffusion via Routing Sparse Attention},
67
+ journal = {arXiv preprint arXiv:2505.18809},
68
+ year = {2025}
69
+ }
70
+ ```