| **VideoSSR-8B** is a multimodal large language model (MLLM) fine-tuned from `Qwen-VL-8B-Instruct` for enhanced video understanding. It is trained using a novel **Video Self-Supervised Reinforcement Learning (VideoSSR)** framework, which generates its own high-quality training data directly from videos, eliminating the need for manual annotation. | |
| - **Base Model:** `Qwen-VL-8B-Instruct` | |
| - **Paper:** [VideoSSR: Video Self-Supervised Reinforcement Learning](https://arxiv.org/abs/2511.06281) | |
| - **Code:** [https://github.com/lcqysl/VideoSSR](https://github.com/lcqysl/VideoSSR) | |