jiayangshi's picture
Improve model card: add pipeline tag, paper link, and fix usage snippet (#1)
6f35a9b
---
library_name: diffusers
license: mit
pipeline_tag: image-to-image
tags:
- computed-tomography
- ct-reconstruction
- diffusion-model
- latent-diffusion
- inverse-problems
- dm4ct
- sparse-view-ct
---
# Latent Diffusion Model – LoDoInd (DM4CT)
This repository contains the pretrained **latent-space diffusion model** used in the
**DM4CT: Benchmarking Diffusion Models for CT Reconstruction (ICLR 2026)** benchmark.
- **Paper:** [DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction](https://huggingface.co/papers/2602.18589)
- **Project Page:** [https://dm4ct.github.io/DM4CT/](https://dm4ct.github.io/DM4CT/)
- **Codebase:** [https://github.com/DM4CT/DM4CT](https://github.com/DM4CT/DM4CT)
---
## πŸ”¬ Model Overview
This model learns a **prior over CT reconstruction images in a compressed latent space** using a denoising diffusion probabilistic model (DDPM).
Unlike pixel diffusion models, diffusion is performed in the latent space of a pretrained autoencoder (VQ-VAE).
- **Architecture**:
- VQ-VAE (image encoder/decoder)
- 2D UNet operating in latent space
- **Input resolution (image space)**: 512 Γ— 512
- **Channels**: 1 (grayscale CT slice)
- **Training objective**: Ξ΅-prediction (standard DDPM formulation)
- **Noise schedule**: Linear beta schedule
- **Training dataset**: Industry CT dataset (LoDoInd)
- **Intensity normalization**: Rescaled to (-1, 1)
This model is intended to be combined with data-consistency correction for CT reconstruction tasks.
---
## πŸ“Š Dataset: LoDoInd
The model was trained on the industrial CT dataset [LoDoInd](https://zenodo.org/records/10391412).
- Reconstructed slices were rescaled to the range (-1, 1).
- The model learns an unconditional latent prior over CT slices; no specific geometry information is embedded in the weights.
---
## 🧠 Training Details
- **Optimizer**: AdamW
- **Learning rate**: 1e-4
- **Hardware**: NVIDIA A100 GPU
- **Training scripts**: Available in the [DM4CT GitHub repository](https://github.com/DM4CT/DM4CT/blob/main/train_latent.py).
---
## πŸš€ Usage
You can load and use this model with the `diffusers` library:
```python
from diffusers import LDMPipeline
import torch
pipeline = LDMPipeline.from_pretrained(
"jiayangshi/lodoind_latent_diffusion"
)
pipeline.to("cuda")
# Generate a sample (unconditional prior)
image = pipeline().images[0]
image.save("generated_ct_slice.png")
```
Note: For actual CT reconstruction, this prior is typically used with data-consistency guidance as described in the paper.
---
## Citation
```bibtex
@inproceedings{
shi2026dmct,
title={{DM}4{CT}: Benchmarking Diffusion Models for Computed Tomography Reconstruction},
author={Shi, Jiayang and Pelt, Dani{\in}l M and Batenburg, K Joost},
booktitle={The Fourteenth International Conference on Learning Representations},
year={2026},
url={https://openreview.net/forum?id=YE5scJekg5}
}
```