Show-o-512x512-RecA

A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.

This repository hosts the model weights for Show-o-512x512-RecA, presented in the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit the RecA GitHub repository and the Project Page. You can also refer to Show-o's original GitHub repository for the base model.

🧠 Method

Paper ArXiv Github Hugging Face Collection HF Demo Project Page

πŸ“Š Benchmarks

Model GenEval ↑ DPGBench ↑ WISE ↑
Show-o-512x512 0.67 82.21 0.40
Show-o-512x512-RecA 0.72 84.94 0.40

License

Show-o-512x512-RecA is licensed under the Apache 2.0 license.

✍️ Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~

@article{xie2025reconstruction,
  title={Reconstruction Alignment Improves Unified Multimodal Models},
  author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
  journal={arXiv preprint arXiv:2509.07295},
  year={2025}
}
Downloads last month
21
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sanaka87/Show-o-512x512-RecA

Finetuned
(1)
this model

Dataset used to train sanaka87/Show-o-512x512-RecA

Collection including sanaka87/Show-o-512x512-RecA