RecA
Collection
Unlocking the Massive Zero-shot Potential in Unified Multimodal Models through Self-supervised Learning!
β’
8 items
β’
Updated
β’
13
A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.
This repository hosts the model weights for Show-o-512x512-RecA, presented in the paper Reconstruction Alignment Improves Unified Multimodal Models. For installation, usage instructions, and further documentation, please visit the RecA GitHub repository and the Project Page. You can also refer to Show-o's original GitHub repository for the base model.
| Model | GenEval β | DPGBench β | WISE β |
|---|---|---|---|
| Show-o-512x512 | 0.67 | 82.21 | 0.40 |
| Show-o-512x512-RecA | 0.72 | 84.94 | 0.40 |
Show-o-512x512-RecA is licensed under the Apache 2.0 license.
If you find our work inspiring or use our codebase in your research, please consider giving a star β and a citation~
@article{xie2025reconstruction,
title={Reconstruction Alignment Improves Unified Multimodal Models},
author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
journal={arXiv preprint arXiv:2509.07295},
year={2025}
}
Base model
showlab/show-o-w-clip-vit-512x512