--- library_name: diffusers license: mit pipeline_tag: image-to-image tags: - computed-tomography - ct-reconstruction - diffusion-model - latent-diffusion - inverse-problems - dm4ct - sparse-view-ct --- # Latent Diffusion Model – LoDoInd (DM4CT) This repository contains the pretrained **latent-space diffusion model** used in the **DM4CT: Benchmarking Diffusion Models for CT Reconstruction (ICLR 2026)** benchmark. - **Paper:** [DM4CT: Benchmarking Diffusion Models for Computed Tomography Reconstruction](https://huggingface.co/papers/2602.18589) - **Project Page:** [https://dm4ct.github.io/DM4CT/](https://dm4ct.github.io/DM4CT/) - **Codebase:** [https://github.com/DM4CT/DM4CT](https://github.com/DM4CT/DM4CT) --- ## 🔬 Model Overview This model learns a **prior over CT reconstruction images in a compressed latent space** using a denoising diffusion probabilistic model (DDPM). Unlike pixel diffusion models, diffusion is performed in the latent space of a pretrained autoencoder (VQ-VAE). - **Architecture**: - VQ-VAE (image encoder/decoder) - 2D UNet operating in latent space - **Input resolution (image space)**: 512 × 512 - **Channels**: 1 (grayscale CT slice) - **Training objective**: ε-prediction (standard DDPM formulation) - **Noise schedule**: Linear beta schedule - **Training dataset**: Industry CT dataset (LoDoInd) - **Intensity normalization**: Rescaled to (-1, 1) This model is intended to be combined with data-consistency correction for CT reconstruction tasks. --- ## 📊 Dataset: LoDoInd The model was trained on the industrial CT dataset [LoDoInd](https://zenodo.org/records/10391412). - Reconstructed slices were rescaled to the range (-1, 1). - The model learns an unconditional latent prior over CT slices; no specific geometry information is embedded in the weights. --- ## 🧠 Training Details - **Optimizer**: AdamW - **Learning rate**: 1e-4 - **Hardware**: NVIDIA A100 GPU - **Training scripts**: Available in the [DM4CT GitHub repository](https://github.com/DM4CT/DM4CT/blob/main/train_latent.py). --- ## 🚀 Usage You can load and use this model with the `diffusers` library: ```python from diffusers import LDMPipeline import torch pipeline = LDMPipeline.from_pretrained( "jiayangshi/lodoind_latent_diffusion" ) pipeline.to("cuda") # Generate a sample (unconditional prior) image = pipeline().images[0] image.save("generated_ct_slice.png") ``` Note: For actual CT reconstruction, this prior is typically used with data-consistency guidance as described in the paper. --- ## Citation ```bibtex @inproceedings{ shi2026dmct, title={{DM}4{CT}: Benchmarking Diffusion Models for Computed Tomography Reconstruction}, author={Shi, Jiayang and Pelt, Dani{\in}l M and Batenburg, K Joost}, booktitle={The Fourteenth International Conference on Learning Representations}, year={2026}, url={https://openreview.net/forum?id=YE5scJekg5} } ```