--- license: mit base_model: - DAMO-NLP-SG/VideoRefer-7B --- # UFVideo-7B This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding. ## 📥 Installation ### Environment First, clone the repository and navigate to the project folder. ```bash git clone https://github.com/Heven-Pan/UFVideo cd UFVideo ``` Then, install the requirement packages. ```bash conda create -n UFVideo python=3.10.14 conda activate UFVideo # our cuda version is 'cu124' pip install -r requirements.txt # other versions have no been verified pip install flash-attn --no-build-isolation ``` #### For evaluation and training, please refer to the [UFVideo](https://github.com/Heven-Pan/UFVideo) repository. ## 📑 Citation Please kindly cite our paper if you find this project helpful. ``` @article{pan2025ufvideo, title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models}, author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others}, journal={arXiv preprint arXiv:2512.11336}, year={2025} } ```