Show-o-512x512-RecA

A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.

This repository hosts the model weights for Show-o-512x512-RecA, presented in the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit the RecA GitHub repository and the Project Page. You can also refer to Show-o's original GitHub repository for the base model.

🧠 Method

📊 Benchmarks

Model	GenEval ↑	DPGBench ↑	WISE ↑
Show-o-512x512	0.67	82.21	0.40
Show-o-512x512-RecA	0.72	84.94	0.40

License

Show-o-512x512-RecA is licensed under the Apache 2.0 license.

✍️ Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~

@article{xie2025reconstruction,
  title={Reconstruction Alignment Improves Unified Multimodal Models},
  author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
  journal={arXiv preprint arXiv:2509.07295},
  year={2025}
}

Downloads last month: 21

Inference Providers NEW

Any-to-Any

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for sanaka87/Show-o-512x512-RecA

Base model

showlab/show-o-w-clip-vit-512x512

Finetuned

(1)

this model

Dataset used to train sanaka87/Show-o-512x512-RecA

Collection including sanaka87/Show-o-512x512-RecA

RecA

Collection

Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning! • 8 items • Updated Sep 22 • 13