Image-Text-to-Text
PaddleOCR
Safetensors
English
Chinese
multilingual
paddleocr_vl
ERNIE4.5
PaddlePaddle
image-to-text
ocr
document-parse
layout
table
formula
chart
conversational
custom_code
Eval Results
Instructions to use PaddlePaddle/PaddleOCR-VL with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- PaddleOCR
How to use PaddlePaddle/PaddleOCR-VL with PaddleOCR:
# See https://www.paddleocr.ai/latest/version3.x/pipeline_usage/PaddleOCR-VL.html to installation from paddleocr import PaddleOCRVL pipeline = PaddleOCRVL(pipeline_version="v1") output = pipeline.predict("path/to/document_image.png") for res in output: res.print() res.save_to_json(save_path="output") res.save_to_markdown(save_path="output") - Notebooks
- Google Colab
- Kaggle
benchmark with DeepSeek-OCR included
#29
by jzhang533 - opened
OCR is definitely a hot topic in the community recently, and it's heating up even more after DeepSeek-OCR joined the field.
PaddleOCR-VL team has evaluated and included DeepSeek-OCR (Gundam-M setting) in OmniDocBench v1.5 benchmark. We hope this helps the community better understand diverse OCR approaches and contributes to advancing the field.
The HTML format version of the above benchmark image, recognized by PaddleOCR-VL.
| Model Type | Methods | Parameters | Overall↑ | TextEdit↓ | FormulaCDM↑ | TableTEDS↑ | TableTEDS-S↑ | Reading OrderEdit↓ |
| Pipeline Tools | Marker-1.8.2 [45] | - | 71.30 | 0.206 | 76.66 | 57.88 | 71.17 | 0.250 |
| Mineru2-pipeline [14] | - | 75.51 | 0.209 | 76.55 | 70.90 | 79.11 | 0.225 | |
| PP-StructureV3 [10] | - | 86.73 | 0.073 | 85.79 | 81.68 | 89.48 | 0.073 | |
| General VLMs | GPT-4o [7] | - | 75.02 | 0.217 | 79.70 | 67.07 | 76.09 | 0.148 |
| InternVL3-76B [46] | 76B | 80.33 | 0.131 | 83.42 | 70.64 | 77.74 | 0.113 | |
| InternVL3.5-241B [47] | 241B | 82.67 | 0.142 | 87.23 | 75.00 | 81.28 | 0.125 | |
| Qwen2.5-VL-72B [24] | 72B | 87.02 | 0.094 | 88.27 | 82.15 | 86.22 | 0.102 | |
| Gemini-2.5 Pro [48] | - | 88.03 | 0.075 | 85.82 | 85.71 | 90.29 | 0.097 | |
| Specialized VLMs | Dolphin [3] | 322M | 74.67 | 0.125 | 67.85 | 68.70 | 77.77 | 0.124 |
| OCRFlux-3B [49] | 3B | 74.82 | 0.193 | 68.03 | 75.75 | 80.23 | 0.202 | |
| Mistral OCR [50] | - | 78.83 | 0.164 | 82.84 | 70.03 | 78.04 | 0.144 | |
| POINTS-Reader [4] | 3B | 80.98 | 0.134 | 79.20 | 77.13 | 81.66 | 0.145 | |
| olmOCR-7B [12] | 7B | 81.79 | 0.096 | 86.04 | 68.92 | 74.77 | 0.121 | |
| MinerU2-VLM [14] | 0.9B | 85.56 | 0.078 | 80.95 | 83.54 | 87.66 | 0.086 | |
| Nanonets-OCR-s [51] | 3B | 85.59 | 0.093 | 85.90 | 80.14 | 85.57 | 0.108 | |
| DeepSeek-OCR-Gundam-M | 3B | 86.46 | 0.081 | 89.45 | 78.02 | 81.55 | 0.093 | |
| MonkeyOCR-pro-1.2B [1] | 1.9B | 86.96 | 0.084 | 85.02 | 84.24 | 89.02 | 0.130 | |
| MonkeyOCR-3B [1] | 3.7B | 87.13 | 0.075 | 87.45 | 81.39 | 85.92 | 0.129 | |
| dots.ocr [52] | 3B | 88.41 | 0.048 | 83.22 | 86.78 | 90.62 | 0.053 | |
| MonkeyOCR-pro-3B [1] | 3.7B | 88.85 | 0.075 | 87.25 | 86.78 | 90.63 | 0.128 | |
| MinerU2.5 [2] | 1.2B | 90.67 | 0.047 | 88.46 | 88.22 | 92.38 | 0.044 | |
| PaddleOCR-VL | 0.9B | 92.56 | 0.035 | 91.43 | 89.76 | 93.52 | 0.043 |
