Heron-NVILA Lite 2B GGUFs
Collection
8 items
•
Updated
This is the Q2_K quantized GGUF version of turing-motors/Heron-NVILA-Lite-2B-hf, a Vision-Language Model optimized for Japanese.
| Component | Details |
|---|---|
| Vision Encoder | SigLIP (paligemma-siglip-so400m-patch14-448) |
| Projector | mlp_downsample_2x2_fix |
| LLM | Qwen2.5-1.5B |
| Total Parameters | ~2B |
This model requires the mmproj (multimodal projector) file to process images:
| Quantization | File Size | Use Case |
|---|---|---|
| F16 | 3.3 GB | Maximum quality |
| Q8_0 | 1.8 GB | High quality |
| Q6_K | 1.4 GB | Balanced |
| Q5_K_M | 1.2 GB | Balanced |
| Q4_K_M | 1.1 GB | Efficient |
| Q3_K_M | 0.9 GB | Compact |
| Q2_K | 0.7 GB | Minimum size |
# Download the model and mmproj
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-Q2_K-GGUF/resolve/main/Heron-NVILA-Lite-2B-Q2_K.gguf
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-mmproj-GGUF/resolve/main/mmproj-heron-nvila-lite-2b-f16.gguf
# Run inference
./llama-mtmd-cli \
-m Heron-NVILA-Lite-2B-Q2_K.gguf \
--mmproj mmproj-heron-nvila-lite-2b-f16.gguf \
--image your_image.jpg \
-p "この画像について説明してください。"
| Quantization | LLM | mmproj | Total |
|---|---|---|---|
| F16 | 3.3 GB | 807 MB | ~4.1 GB |
| Q8_0 | 1.8 GB | 807 MB | ~2.6 GB |
| Q6_K | 1.4 GB | 807 MB | ~2.2 GB |
| Q5_K_M | 1.2 GB | 807 MB | ~2.0 GB |
| Q4_K_M | 1.1 GB | 807 MB | ~1.9 GB |
| Q3_K_M | 0.9 GB | 807 MB | ~1.7 GB |
| Q2_K | 0.7 GB | 807 MB | ~1.5 GB |
This model inherits the license from the original Heron-NVILA-Lite-2B model. Please refer to turing-motors/Heron-NVILA-Lite-2B-hf for license details.
2-bit