Text-to-Image
Diffusers
English
Chinese

Harmon-0.5B-RecA

A self-supervised training framework that aligns understanding and generation in modest compute, with huge zero-shot gain on generation and editing capability.

This repository hosts the model weights for Harmon-0.5B-RecA. For installation, usage instructions, and further documentation, please visit Harmon's original GitHub repository.

🧠 Method

Paper ArXiv Github Hugging Face Collection HF Demo Project Page

πŸ“Š Benchmarks

Model GenEval ↑ DPGBench ↑ WISE ↑
Harmon-0.5B 0.68 80.12 0.33
Harmon-0.5B-RecA 0.79 84.67 0.40

License

Harmon-0.5B-RecA is licensed under the Apache 2.0 license.

✍️ Citation

If you find our work inspiring or use our codebase in your research, please consider giving a star ⭐ and a citation~

@article{xie2025reconstruction,
  title={Reconstruction Alignment Improves Unified Multimodal Models},
  author={Xie, Ji and Darrell, Trevor and Zettlemoyer, Luke and Wang, XuDong},
  journal={arXiv preprint arXiv:2509.07295},
  year={2025}
}
Downloads last month
15
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for sanaka87/Harmon-0.5B-RecA

Base model

wusize/Harmon-0_5B
Finetuned
(1)
this model

Dataset used to train sanaka87/Harmon-0.5B-RecA

Collection including sanaka87/Harmon-0.5B-RecA