Join the conversation

Join the community of Machine Learners and AI enthusiasts.

Sign Up
prithivMLmods 
posted an update 17 days ago
Post
1299
Try the demo of NVIDIA Nemotron Parse v1.1, NVIDIA's latest VLM for understanding document semantics and extracting text and table elements with spatial grounding. It is capable of comprehensive text understanding and document structure analysis in a given document, and can provide bounding boxes with coordinates.

⭐Space[Demo]: prithivMLmods/NVIDIA-Nemotron-Parse-OCR
⭐Model: nvidia/NVIDIA-Nemotron-Parse-v1.1
⭐Multimodal-Spaces: https://huggingface.co/collections/prithivMLmods/multimodal-implementations

Some relevant Spaces

⭐DeepSeek-OCR-experimental [latest transformers]: prithivMLmods/DeepSeek-OCR-experimental
⭐Qwen3-VL-Outpost: prithivMLmods/Qwen3-VL-Outpost
⭐Multimodal-OCR3: prithivMLmods/Multimodal-OCR3

Check out the other spaces in the multimodal implementation collection.

To know more about it, visit the app page or the respective model page!
In this post