File size: 1,227 Bytes
dd1e4dc da927bd dd1e4dc 34825ae |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 |
---
license: apache-2.0
pipeline_tag: robotics
library_name: transformers
---
# Mantis
> This is the official checkpoint of **Mantis: A Versatile Vision-Language-Action Model
with Disentangled Visual Foresight**
- **Paper:** https://arxiv.org/pdf/2511.16175
- **Code:** https://github.com/zhijie-group/Mantis
### 🔥 Highlights
- **Disentangled Visual Foresight** augments action learning without overburdening the backbone.
- **Progressive Training** preserves the understanding capabilities of the backbone.
- **Adaptive Temporal Ensemble** reduces inference cost while maintaining stable control.
### How to use
This is the base Mantis model. For detailed usage please refer to [our repository](https://github.com/zhijie-group/Mantis).
### 📝 Citation
If you find our code or models useful in your work, please cite [our paper](https://arxiv.org/pdf/2511.16175):
```
@article{yang2025mantis,
title={Mantis: A Versatile Vision-Language-Action Model with Disentangled Visual Foresight},
author={Yang, Yi and Li, Xueqi and Chen, Yiyang and Song, Jin and Wang, Yihan and Xiao, Zipeng and Su, Jiadi and Qiaoben, You and Liu, Pengfei and Deng, Zhijie},
journal={arXiv preprint arXiv:2511.16175},
year={2025}
}
``` |