MiniSora: Fully Open Video Diffusion with ColossalAI
GitHub: YN35/minisora
Author (X / Twitter): @ramu0e
π§Ύ Overview
MiniSora is a fully open video diffusion codebase designed for everything from research to production.
- All training, inference, and evaluation scripts are available
- Supports multi-GPU / multi-node training via ColossalAI
- Simple DiT-based video model + pipeline, inspired by Diffusers
- Includes a continuation demo to generate the "next" part of an existing video
This model card hosts the DiT pipeline trained on DMLab trajectories and published as ramu0e/minisora-dmlab.
π Inference: Text-to-Video (Minimal Example)
from minisora.models import DiTPipeline
pipeline = DiTPipeline.from_pretrained("ramu0e/minisora-dmlab")
output = pipeline(
batch_size=1,
num_inference_steps=28,
height=64,
width=64,
num_frames=20,
)
latents = output.latents # shape: (B, C, F, H, W)
latents are video tensors that remain in the same normalized space as training.
Use the scripts in the repository to decode or visualize them.
π₯ Continuation: Generate the Rest of a Video
MiniSora also supports continuation-style generation (like Sora), where subsequent frames are sampled while conditioning on the observed prefix.
A demo script is bundled to extend existing videos.
uv run scripts/demo/full_continuation.py \
--model-id ramu0e/minisora-dmlab \
--input-video path/to/input.mp4 \
--num-extend-frames 12 \
--num-inference-steps 28 \
--seed 1234
See scripts/demo/full_continuation.py for the exact arguments and I/O specification.
π§© Key Features
End-to-End Transparency
- Model definition (DiT):
src/minisora/models/modeling_dit.py - Pipeline:
src/minisora/models/pipeline_dit.py - Training script:
scripts/train.py - Data loaders:
src/minisora/data/
Every stage from data to inference is available.
- Model definition (DiT):
ColossalAI for Scale-Out Training
- Zero / DDP plugins
- Designed for multi-GPU and multi-node setups
- Easy experimentation with large video models
Simple, Readable Implementation
- Dependency management via
uv(uv sync/uv run) - Minimal Diffusers-inspired video DiT pipeline
- Experiments and analysis scripts organized under
reports/
- Dependency management via
Continuation / Conditioning Ready
- Masking logic to fix conditioned frames
- Training scheme that applies noise to only part of the sequence
π Install & Setup
1. Clone the Repository
git clone https://github.com/YN35/minisora.git
cd minisora
2. Install Dependencies with uv
uv sync
All scripts can then be executed through uv run ....
π¦ This Pipeline (ramu0e/minisora-dmlab)
This Hugging Face repository distributes the MiniSora DiT pipeline checkpoint trained on DMLab trajectories.
- Model type: DiT-based video diffusion model
- Training resolution: e.g., 64Γ64 or 128Γ128 (see
reports/in the repo) - Frames per sample: typically 20
- Library:
minisora(custom lightweight framework) - Use case: research or sample-quality video generation
π§ͺ Training (Summary)
Complete training code is available in the repository.
- Main script:
scripts/train.py - Highlights:
- Rectified-flow style training with
FlowMatchEulerDiscreteScheduler - ColossalAI Booster to switch between Zero / DDP
- Conditioning-aware objective (noise partial subsets of frames)
- Rectified-flow style training with
Example: Single-Node Training
uv run scripts/train.py \
--dataset_type minecraft \
--data_root /path/to/train_data \
--outputs outputs/exp1 \
--batch_size 32 \
--precision bf16
Example: Multi-Node (torchrun + ColossalAI)
torchrun --nnodes 2 --nproc_per_node 8 scripts/train.py \
--dataset_type minecraft \
--data_root /path/to/train_data \
--outputs outputs/exp-multinode \
--batch_size 64 \
--plugin zero --zero 1
Refer to scripts/train.py for all available options.
π Repository Structure (Excerpt)
src/minisora/models/modeling_dit.pyβ core DiT transformer for videosrc/minisora/models/pipeline_dit.pyβ Diffusers-style pipeline (DiTPipeline)src/minisora/data/β datasets and distributed samplers (DMLab, Minecraft)scripts/train.pyβ ColossalAI-based training loopscripts/demo/full_vgen.pyβ simple end-to-end video generation demoscripts/demo/full_continuation.pyβ continuation demoreports/β experiment notes, mask visualizations, metric scripts
π Limitations & Notes
- This checkpoint targets research-scale experiments.
- Quality at higher resolution or longer durations depends on data and hyperparameters.
- Continuation quality varies with the provided prefix and conditioning setup.
π€ Contributions
- Contributions to code, models, and docs are welcome.
- Please open issues or PRs at YN35/minisora.
π License
- Code and weights are released under the MIT License.
Commercial use, modification, and redistribution are all permitted (see the GitHubLICENSE).
MIT License
Copyright (c) YN
Permission is hereby granted, free of charge, to any person obtaining a copy
...