Multimodal/VLM - a ingridtv Collection

ingridtv 's Collections

Document understanding

Medical LM, Specific

Medical images, encoding

Multimodal/VLM

updated Nov 18

microsoft/Phi-4-multimodal-instruct

Automatic Speech Recognition • 6B • Updated 17 days ago • 284k • 1.55k
microsoft/Phi-4-mini-instruct

Text Generation • 4B • Updated 17 days ago • 241k • 648
SmolDocling: An ultra-compact vision-language model for end-to-end multi-modal document conversion

Paper • 2503.11576 • Published Mar 14 • 121
Emerging Properties in Unified Multimodal Pretraining

Paper • 2505.14683 • Published May 20 • 133
google/medgemma-4b-it

Image-Text-to-Text • 4B • Updated Oct 28 • 386k • 804
kelkalot/medgemma-4b-it-GGUF

4B • Updated May 22 • 219 • 7
Qwen/Qwen3-VL-8B-Instruct

Image-Text-to-Text • 9B • Updated Oct 15 • 2.73M • • 604
Qwen/Qwen3-VL-8B-Instruct-GGUF

Image-Text-to-Text • 8B • Updated Nov 1 • 26.2k • 37
Qwen/Qwen3-VL-2B-Instruct-GGUF

Image-Text-to-Text • 2B • Updated Nov 1 • 9.76k • 17
deepseek-ai/DeepSeek-OCR

Image-Text-to-Text • 3B • Updated Nov 4 • 4.26M • 3k