imsuperkong commited on
Commit
064748c
Β·
verified Β·
1 Parent(s): 6aa43ce

Upload 4 files

Browse files
.gitattributes CHANGED
@@ -33,3 +33,6 @@ saved_model/**/* filter=lfs diff=lfs merge=lfs -text
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
 
 
 
 
33
  *.zip filter=lfs diff=lfs merge=lfs -text
34
  *.zst filter=lfs diff=lfs merge=lfs -text
35
  *tfevents* filter=lfs diff=lfs merge=lfs -text
36
+ assets/teaser.png filter=lfs diff=lfs merge=lfs -text
37
+ assets/web_teaser.gif filter=lfs diff=lfs merge=lfs -text
38
+ assets/web_teaser.mp4 filter=lfs diff=lfs merge=lfs -text
README.md ADDED
@@ -0,0 +1,195 @@
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
1
+ <div align="center">
2
+ <img src="assets/teaser.png">
3
+
4
+ <a href="https://hyokong.github.io/worldwarp-page/"><h1>🌏 WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion πŸŒ€</h1></a>
5
+ </h2>
6
+ </div>
7
+
8
+ <h5 align="center">
9
+
10
+ [![Home Page](https://img.shields.io/badge/Project-Website-33728E.svg)](https://hyokong.github.io/worldwarp-page/)
11
+ [![arXiv](https://img.shields.io/badge/Arxiv-2509.26645-b31b1b.svg?logo=arXiv)](https://arxiv.org/abs/2509.26645)
12
+ [![HuggingFace](https://img.shields.io/badge/%F0%9F%A4%97%20Hugging%20Face-Demo-blue)](https://huggingface.co/imsuperkong/worldwarp) [![Watch on YouTube](https://img.shields.io/badge/YouTube-Demo_Video-red?style=flat&logo=youtube)](https://www.youtube.com/watch?v=rfMHxb--cKs)
13
+
14
+
15
+ [Hanyang Kong](https://hyokong.github.io/),
16
+ [Xingyi Yang](https://adamdad.github.io/),
17
+ Xiaoxu Zheng,
18
+ [Xinchao Wang](https://sites.google.com/site/sitexinchaowang/)
19
+ </h5>
20
+
21
+ **TL;DR**: πŸ”­ Single-image long-range view generation via an <u>asynchronous chunk-wise autoregressive diffusion framework</u> that utilizes <u>explicit camera conditioning</u> and <u>online 3D cache</u> for geometric consistency.
22
+
23
+
24
+ ## 🎬 Demo Video
25
+
26
+ ▢️ **Click the GIF to watch the full video with sound.**
27
+
28
+ <p align="center">
29
+ <a href="https://www.youtube.com/watch?v=rfMHxb--cKs">
30
+ <img src="assets/web_teaser.gif" alt="WorldWarp Demo" width="100%">
31
+ </a>
32
+ </p>
33
+
34
+ ## πŸ› οΈ Installation
35
+
36
+ > ⚠️ **Hardware Note:** The current implementation requires high GPU memory (~40GB VRAM). We are currently optimizing the code to reduce this footprint.
37
+
38
+ ### 🧬 Cloning the Repository
39
+ The repository contains submodules, thus please check it out with
40
+ ```bash
41
+ git clone https://github.com/HyoKong/WorldWarp.git --recursive
42
+ cd WorldWarp
43
+ ```
44
+
45
+ ### 🐍 Create environment
46
+
47
+ Create a conda environment and install dependencies:
48
+ ```
49
+ conda create -n worldwarp python=3.12 -y
50
+ conda activate worldwarp
51
+ ```
52
+
53
+ ### πŸ”₯ Install PyTorch
54
+ Install PyTorch with CUDA 12.6 support (or visit [PyTorch Previous Versions](https://pytorch.org/get-started/previous-versions/) for other CUDA configurations):
55
+ ```bash
56
+ pip install torch==2.7.1 torchvision==0.22.1 torchaudio==2.7.1 --index-url https://download.pytorch.org/whl/cu126
57
+ ```
58
+
59
+ ### πŸ“¦ Install Dependencies & Compile Extensions
60
+ These packages require compilation against the specific PyTorch version installed above.
61
+
62
+ ```bash
63
+ # Core compiled dependencies
64
+ pip install flash-attn --no-build-isolation
65
+ pip install "git+https://github.com/facebookresearch/pytorch3d.git" --no-build-isolation
66
+
67
+ # Local modules
68
+ pip install src/fused-ssim/ --no-build-isolation
69
+ pip install src/simple-knn/ --no-build-isolation
70
+
71
+ # Remaining python dependencies
72
+ pip install -r requirements.txt
73
+ ```
74
+
75
+
76
+
77
+ ### πŸ—οΈ Build Other Extensions
78
+ ```bash
79
+ cd src/ttt3r/croco/models/curope/
80
+ python setup.py build_ext --inplace
81
+ cd - # Returns to the project root
82
+ ```
83
+
84
+
85
+ ## ☁️ Download checkpoints
86
+
87
+ ```
88
+ mkdir ckpt
89
+ hf download Wan-AI/Wan2.1-T2V-1.3B-Diffusers --local-dir ckpt/Wan-AI/Wan2.1-T2V-1.3B-Diffusers
90
+ hf download Qwen/Qwen2.5-VL-7B-Instruct --local-dir ckpt/Qwen/Qwen2.5-VL-7B-Instruct
91
+ hf download imsuperkong/worldwarp --local-dir ckpt/
92
+
93
+ cd src/ttt3r/
94
+ gdown --fuzzy https://drive.google.com/file/d/1Asz-ZB3FfpzZYwunhQvNPZEUA8XUNAYD/view?usp=drive_link
95
+ cd ../..
96
+ ```
97
+
98
+ ## 🎨 GUI Demo
99
+
100
+ ```bash
101
+ python gradio_demo.py
102
+ ```
103
+
104
+ The web interface will open at `http://localhost:7890`.
105
+
106
+ ---
107
+
108
+ ### πŸš€ Quick start:
109
+
110
+ **1️⃣ Choose Starting Image**
111
+
112
+ - **πŸ“š Examples Tab**: Click a pre-made example image (prompt auto-fills)
113
+ - **🎨 Generate Tab**: Click "Generate First Frame" from your prompt
114
+ - **πŸ“€ Upload Tab**: Upload your own image
115
+
116
+ **2️⃣ Select Camera Movement** (Recommended: πŸ“Ή From Video)
117
+
118
+ - **From Video** (Easiest and most reliable)
119
+ - Click **"πŸ“Ή From Video"** mode
120
+ - Select an example video from the gallery OR upload your own
121
+ - Click **"🎯 Load Poses"** to extract camera trajectory
122
+ - Poses are automatically cached for reuse
123
+
124
+ - **Preset Movements**
125
+ - Select **"🎯 Preset"** mode
126
+ - Choose movements: `DOLLY_IN`, `PAN_LEFT`, `PAN_RIGHT`, etc.
127
+ - Can combine: e.g., `DOLLY_IN + PAN_RIGHT`
128
+
129
+ - **Custom** (Advanced)
130
+ - Select **"πŸ”§ Custom"** mode
131
+ - Manually control rotation and translation parameters
132
+
133
+ **3️⃣ Configure & Generate**
134
+
135
+ **Essential Parameters:**
136
+
137
+ - πŸ’ͺ **Strength (0.5 - 0.8)**
138
+ - **Higher (0.7-0.8)**: More generated details, richer content
139
+ - ⚠️ May introduce content changes due to higher creative freedom
140
+ - **Lower (0.5-0.6)**: More accurate camera control, closer to input
141
+ - ⚠️ May produce blurry results due to limited diffusion model freedom
142
+ - **Trade-off**: Higher strength = more details but less control; Lower strength = better control but potentially blurry
143
+
144
+ - ⚑ **Speed Multiplier**
145
+ - **Purpose**: Adjust camera movement velocity to match your scene scale
146
+ - **Why needed**: Reference video's camera movement scale may not match your scene (e.g., drone video moving 10 meters may be too fast for a small room)
147
+ - **< 1.0**: Slower camera movement (e.g., 0.5 = half speed)
148
+ - **= 1.0**: Original speed from reference
149
+ - **> 1.0**: Faster camera movement (e.g., 2.0 = double speed)
150
+ - **Tip**: Start with 1.0, then adjust based on whether motion feels too fast or too slow
151
+
152
+ ---
153
+
154
+ #### 🌟 Best Practices
155
+
156
+ - πŸ‘οΈ **Generate one chunk at a time**
157
+ - Lets you preview each chunk's quality before continuing
158
+ - Easier to identify issues early
159
+
160
+ - ↩️ **Use Rollback for iteration**
161
+ - If a chunk is unsatisfactory, enter its number in **"Rollback to #"**
162
+ - Click **"βœ‚οΈ Rollback"** to remove it
163
+ - Adjust parameters and regenerate
164
+
165
+ - 🏎️ **Adjust Speed Multiplier per scene**
166
+ - If camera moves too fast β†’ decrease value (e.g., 0.5-0.7)
167
+ - If camera moves too slow β†’ increase value (e.g., 1.5-2.0)
168
+
169
+
170
+
171
+
172
+
173
+
174
+ ## πŸ™Œ Acknowledgements
175
+ Our code is based on the following awesome repositories:
176
+
177
+ - [DFoT](https://github.com/kwsong0113/diffusion-forcing-transformer)
178
+ - [TTT3R](https://github.com/Inception3D/TTT3R)
179
+
180
+ We thank the authors for releasing their code!
181
+
182
+ ## πŸ“– Citation
183
+
184
+ If you find our work useful, please cite:
185
+
186
+ ```bibtex
187
+ @misc{kong2025worldwarp,
188
+ title={WorldWarp: Propagating 3D Geometry with Asynchronous Video Diffusion},
189
+ author={Hanyang Kong and Xingyi Yang and Xiaoxu Zheng and Xinchao Wang},
190
+ year={2025},
191
+ eprint={2512.xxxxx},
192
+ archivePrefix={arXiv},
193
+ primaryClass={cs.CV}
194
+ }
195
+ ```
assets/teaser.png ADDED

Git LFS Details

  • SHA256: 80c64c50a99a8d639820dcf9649ee6aab33175c68e3f37278cf69bdad6d46e00
  • Pointer size: 133 Bytes
  • Size of remote file: 10.5 MB
assets/web_teaser.gif ADDED

Git LFS Details

  • SHA256: 6814d2307f8d56b56cc33b89188c22992ceaf3f2a6a073353cd20aff4b9aebeb
  • Pointer size: 133 Bytes
  • Size of remote file: 19.7 MB
assets/web_teaser.mp4 ADDED
@@ -0,0 +1,3 @@
 
 
 
 
1
+ version https://git-lfs.github.com/spec/v1
2
+ oid sha256:f71d1bbb4954fd8ecb1afb76e7a3414769cfc4647fd0ff835dcfa4c4146161e2
3
+ size 24182307