File size: 5,595 Bytes
2c042ec 2e69f8d 2c042ec 2e69f8d 2c042ec 2e69f8d 4552285 2e69f8d 4552285 2c042ec 4552285 2e69f8d 2c042ec 2e69f8d 2c042ec 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2c042ec 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d 4552285 2e69f8d | 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 | ---
library_name: minisora
license: mit
language:
- en
tags:
- text-to-video
- video-diffusion
- continuation
- colossalai
pipeline_tag: text-to-video
---
# MiniSora: Fully Open Video Diffusion with ColossalAI
[GitHub: YN35/minisora](https://github.com/YN35/minisora)
[Author (X / Twitter): @__ramu0e__](https://x.com/__ramu0e__)
---
## π§Ύ Overview
**MiniSora** is a fully open video diffusion codebase designed for everything from research to production.
- All training, inference, and evaluation scripts are available
- Supports multi-GPU / multi-node training via **ColossalAI**
- Simple DiT-based video model + pipeline, inspired by Diffusers
- Includes a continuation demo to generate the "next" part of an existing video
This model card hosts the DiT pipeline trained on DMLab trajectories and published as `ramu0e/minisora-dmlab`.
---
## π Inference: Text-to-Video (Minimal Example)
```python
from minisora.models import DiTPipeline
pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab")
output = pipeline(
batch_size=1,
num_inference_steps=28,
height=64,
width=64,
num_frames=20,
)
latents = output.latents # shape: (B, C, F, H, W)
```
`latents` are video tensors that remain in the same normalized space as training.
Use the scripts in the repository to decode or visualize them.
---
## π₯ Continuation: Generate the Rest of a Video
MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix.
A demo script is bundled to extend existing videos.
```bash
uv run scripts/demo/full_continuation.py \
--model-id ramu0e/minisora-dmlab \
--input-video path/to/input.mp4 \
--num-extend-frames 12 \
--num-inference-steps 28 \
--seed 1234
```
See `scripts/demo/full_continuation.py` for the exact arguments and I/O specification.
---
## π§© Key Features
- **End-to-End Transparency**
- Model definition (DiT): `src/minisora/models/modeling_dit.py`
- Pipeline: `src/minisora/models/pipeline_dit.py`
- Training script: `scripts/train.py`
- Data loaders: `src/minisora/data/`
Every stage from data to inference is available.
- **ColossalAI for Scale-Out Training**
- Zero / DDP plugins
- Designed for multi-GPU and multi-node setups
- Easy experimentation with large video models
- **Simple, Readable Implementation**
- Dependency management via `uv` (`uv sync` / `uv run`)
- Minimal Diffusers-inspired video DiT pipeline
- Experiments and analysis scripts organized under `reports/`
- **Continuation / Conditioning Ready**
- Masking logic to fix conditioned frames
- Training scheme that applies noise to only part of the sequence
---
## π Install & Setup
### 1. Clone the Repository
```bash
git clone https://github.com/YN35/minisora.git
cd minisora
```
### 2. Install Dependencies with `uv`
```bash
uv sync
```
All scripts can then be executed through `uv run ...`.
---
## π¦ This Pipeline (`ramu0e/minisora-dmlab`)
This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories.
- **Model type**: DiT-based video diffusion model
- **Training resolution**: e.g., 64Γ64 or 128Γ128 (see `reports/` in the repo)
- **Frames per sample**: typically 20
- **Library**: `minisora` (custom lightweight framework)
- **Use case**: research or sample-quality video generation
---
## π§ͺ Training (Summary)
Complete training code is available in the repository.
- Main script: `scripts/train.py`
- Highlights:
- Rectified-flow style training with `FlowMatchEulerDiscreteScheduler`
- ColossalAI Booster to switch between Zero / DDP
- Conditioning-aware objective (noise partial subsets of frames)
### Example: Single-Node Training
```bash
uv run scripts/train.py \
--dataset_type minecraft \
--data_root /path/to/train_data \
--outputs outputs/exp1 \
--batch_size 32 \
--precision bf16
```
### Example: Multi-Node (torchrun + ColossalAI)
```bash
torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \
--dataset_type minecraft \
--data_root /path/to/train_data \
--outputs outputs/exp-multinode \
--batch_size 64 \
--plugin zero --zero 1
```
Refer to `scripts/train.py` for all available options.
---
## π Repository Structure (Excerpt)
- `src/minisora/models/modeling_dit.py` β core DiT transformer for video
- `src/minisora/models/pipeline_dit.py` β Diffusers-style pipeline (`DiTPipeline`)
- `src/minisora/data/` β datasets and distributed samplers (DMLab, Minecraft)
- `scripts/train.py` β ColossalAI-based training loop
- `scripts/demo/full_vgen.py` β simple end-to-end video generation demo
- `scripts/demo/full_continuation.py` β continuation demo
- `reports/` β experiment notes, mask visualizations, metric scripts
---
## π Limitations & Notes
- This checkpoint targets research-scale experiments.
- Quality at higher resolution or longer durations depends on data and hyperparameters.
- Continuation quality varies with the provided prefix and conditioning setup.
---
## π€ Contributions
- Contributions to code, models, and docs are welcome.
- Please open issues or PRs at [YN35/minisora](https://github.com/YN35/minisora).
---
## π License
- Code and weights are released under the **MIT License**.
Commercial use, modification, and redistribution are all permitted (see the GitHub `LICENSE`).
```text
MIT License
Copyright (c) YN
Permission is hereby granted, free of charge, to any person obtaining a copy
...
```
|