VideoSSR / README.md

Create README.md

f3257c9 verified about 1 month ago

578 Bytes

VideoSSR-8B is a multimodal large language model (MLLM) fine-tuned from Qwen-VL-8B-Instruct for enhanced video understanding. It is trained using a novel Video Self-Supervised Reinforcement Learning (VideoSSR) framework, which generates its own high-quality training data directly from videos, eliminating the need for manual annotation.

Base Model: Qwen-VL-8B-Instruct
Paper: VideoSSR: Video Self-Supervised Reinforcement Learning
Code: https://github.com/lcqysl/VideoSSR