File size: 5,595 Bytes

---
library_name: minisora
license: mit
language:
  - en
tags:
  - text-to-video
  - video-diffusion
  - continuation
  - colossalai
pipeline_tag: text-to-video
---

# MiniSora: Fully Open Video Diffusion with ColossalAI

[GitHub: YN35/minisora](https://github.com/YN35/minisora)  
[Author (X / Twitter): @__ramu0e__](https://x.com/__ramu0e__)

---

## 🧾 Overview

**MiniSora** is a fully open video diffusion codebase designed for everything from research to production.

- All training, inference, and evaluation scripts are available
- Supports multi-GPU / multi-node training via **ColossalAI**
- Simple DiT-based video model + pipeline, inspired by Diffusers
- Includes a continuation demo to generate the "next" part of an existing video

This model card hosts the DiT pipeline trained on DMLab trajectories and published as `ramu0e/minisora-dmlab`.

---

## 🚀 Inference: Text-to-Video (Minimal Example)

```python
from minisora.models import DiTPipeline

pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab")

output = pipeline(
    batch_size=1,
    num_inference_steps=28,
    height=64,
    width=64,
    num_frames=20,
)
latents = output.latents  # shape: (B, C, F, H, W)
```

`latents` are video tensors that remain in the same normalized space as training.  
Use the scripts in the repository to decode or visualize them.

---

## 🎥 Continuation: Generate the Rest of a Video

MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix.  
A demo script is bundled to extend existing videos.

```bash
uv run scripts/demo/full_continuation.py \
  --model-id ramu0e/minisora-dmlab \
  --input-video path/to/input.mp4 \
  --num-extend-frames 12 \
  --num-inference-steps 28 \
  --seed 1234
```

See `scripts/demo/full_continuation.py` for the exact arguments and I/O specification.

---

## 🧩 Key Features

- **End-to-End Transparency**  
  - Model definition (DiT): `src/minisora/models/modeling_dit.py`  
  - Pipeline: `src/minisora/models/pipeline_dit.py`  
  - Training script: `scripts/train.py`  
  - Data loaders: `src/minisora/data/`  
  Every stage from data to inference is available.

- **ColossalAI for Scale-Out Training**  
  - Zero / DDP plugins  
  - Designed for multi-GPU and multi-node setups  
  - Easy experimentation with large video models

- **Simple, Readable Implementation**  
  - Dependency management via `uv` (`uv sync` / `uv run`)  
  - Minimal Diffusers-inspired video DiT pipeline  
  - Experiments and analysis scripts organized under `reports/`

- **Continuation / Conditioning Ready**  
  - Masking logic to fix conditioned frames  
  - Training scheme that applies noise to only part of the sequence

---

## 🛠 Install & Setup

### 1. Clone the Repository

```bash
git clone https://github.com/YN35/minisora.git
cd minisora
```

### 2. Install Dependencies with `uv`

```bash
uv sync
```

All scripts can then be executed through `uv run ...`.

---

## 📦 This Pipeline (`ramu0e/minisora-dmlab`)

This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories.

- **Model type**: DiT-based video diffusion model  
- **Training resolution**: e.g., 64×64 or 128×128 (see `reports/` in the repo)  
- **Frames per sample**: typically 20  
- **Library**: `minisora` (custom lightweight framework)  
- **Use case**: research or sample-quality video generation

---

## 🧪 Training (Summary)

Complete training code is available in the repository.

- Main script: `scripts/train.py`
- Highlights:
  - Rectified-flow style training with `FlowMatchEulerDiscreteScheduler`
  - ColossalAI Booster to switch between Zero / DDP
  - Conditioning-aware objective (noise partial subsets of frames)

### Example: Single-Node Training

```bash
uv run scripts/train.py \
  --dataset_type minecraft \
  --data_root /path/to/train_data \
  --outputs outputs/exp1 \
  --batch_size 32 \
  --precision bf16
```

### Example: Multi-Node (torchrun + ColossalAI)

```bash
torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \
  --dataset_type minecraft \
  --data_root /path/to/train_data \
  --outputs outputs/exp-multinode \
  --batch_size 64 \
  --plugin zero --zero 1
```

Refer to `scripts/train.py` for all available options.

---

## 📚 Repository Structure (Excerpt)

- `src/minisora/models/modeling_dit.py` – core DiT transformer for video
- `src/minisora/models/pipeline_dit.py` – Diffusers-style pipeline (`DiTPipeline`)
- `src/minisora/data/` – datasets and distributed samplers (DMLab, Minecraft)
- `scripts/train.py` – ColossalAI-based training loop
- `scripts/demo/full_vgen.py` – simple end-to-end video generation demo
- `scripts/demo/full_continuation.py` – continuation demo
- `reports/` – experiment notes, mask visualizations, metric scripts

---

## 🔍 Limitations & Notes

- This checkpoint targets research-scale experiments.  
- Quality at higher resolution or longer durations depends on data and hyperparameters.  
- Continuation quality varies with the provided prefix and conditioning setup.

---

## 🤝 Contributions

- Contributions to code, models, and docs are welcome.  
- Please open issues or PRs at [YN35/minisora](https://github.com/YN35/minisora).

---

## 📄 License

- Code and weights are released under the **MIT License**.  
  Commercial use, modification, and redistribution are all permitted (see the GitHub `LICENSE`).

```text
MIT License
Copyright (c) YN
Permission is hereby granted, free of charge, to any person obtaining a copy
...
```