---
license: mit
base_model:
- DAMO-NLP-SG/VideoRefer-7B
---

# UFVideo-7B

This repository provides the complete code and datasets for UFVideo, a Video LLM that flexibly unifies general question answering, video object referring, video segmentation, and temporal video grounding to achieve multi-grained video understanding.

<!-- <p align="center"><img width="750" src="https://raw.githubusercontent.com/Heven-Pan/UFVideo/refs/heads/main/figs/overall_tasks.png"></p> -->

## 📥 Installation
### Environment
First, clone the repository and navigate to the project folder.
```bash
git clone https://github.com/Heven-Pan/UFVideo
cd UFVideo
```
Then, install the requirement packages.
```bash
conda create -n UFVideo python=3.10.14
conda activate UFVideo

# our cuda version is 'cu124'
pip install -r requirements.txt
# other versions have no been verified
pip install flash-attn --no-build-isolation
```

#### For evaluation and training, please refer to the [UFVideo](https://github.com/Heven-Pan/UFVideo) repository.

## 📑 Citation

Please kindly cite our paper if you find this project helpful.

```
@article{pan2025ufvideo,
  title={UFVideo: Towards Unified Fine-Grained Video Cooperative Understanding with Large Language Models},
  author={Pan, Hewen and Wei, Cong and Liang, Dashuang and Huang, Zepeng and Gao, Pengfei and Zhou, Ziqi and Xue, Lulu and Yan, Pengfei and Wei, Xiaoming and Li, Minghui and others},
  journal={arXiv preprint arXiv:2512.11336},
  year={2025}
}
```