Heron-NVILA Lite 2B GGUFs
Collection
8 items
โข
Updated
This is the multimodal projector (mmproj) GGUF file for turing-motors/Heron-NVILA-Lite-2B-hf, a Vision-Language Model optimized for Japanese.
The mmproj (multimodal projector) file contains the vision encoder and projector weights that are required to process images with the Heron VLM. This file is required in addition to the LLM GGUF file to perform vision-language tasks.
lfm2 in llama.cpp)| Parameter | Value |
|---|---|
| Vision Hidden Size | 1152 |
| Vision Intermediate Size | 4304 |
| Vision Attention Heads | 16 |
| Vision Layers | 27 |
| Projector Scale Factor | 2 (2x2 downsampling) |
| Image Mean | [0.5, 0.5, 0.5] |
| Image Std | [0.5, 0.5, 0.5] |
This mmproj file works with any of the quantized Heron LLM models:
| Quantization | Repository | File Size |
|---|---|---|
| F16 | nawta/Heron-NVILA-Lite-2B-F16-GGUF | 3.3 GB |
| Q8_0 | nawta/Heron-NVILA-Lite-2B-Q8_0-GGUF | 1.8 GB |
| Q6_K | nawta/Heron-NVILA-Lite-2B-Q6_K-GGUF | 1.4 GB |
| Q5_K_M | nawta/Heron-NVILA-Lite-2B-Q5_K_M-GGUF | 1.2 GB |
| Q4_K_M | nawta/Heron-NVILA-Lite-2B-Q4_K_M-GGUF | 1.0 GB |
| Q3_K_M | nawta/Heron-NVILA-Lite-2B-Q3_K_M-GGUF | 0.8 GB |
| Q2_K | nawta/Heron-NVILA-Lite-2B-Q2_K-GGUF | 0.6 GB |
# Download the mmproj and an LLM model (e.g., Q4_K_M)
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-mmproj-GGUF/resolve/main/mmproj-heron-nvila-lite-2b-f16.gguf
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-Q4_K_M-GGUF/resolve/main/Heron-NVILA-Lite-2B-Q4_K_M.gguf
# Run inference
./llama-mtmd-cli \
-m Heron-NVILA-Lite-2B-Q4_K_M.gguf \
--mmproj mmproj-heron-nvila-lite-2b-f16.gguf \
--image your_image.jpg \
-p "ใใฎ็ปๅใซใคใใฆ่ชฌๆใใฆใใ ใใใ"
Total memory usage depends on the chosen LLM quantization:
| LLM Quantization | LLM | mmproj | Total |
|---|---|---|---|
| F16 | 3.3 GB | 807 MB | ~4.1 GB |
| Q8_0 | 1.8 GB | 807 MB | ~2.6 GB |
| Q6_K | 1.4 GB | 807 MB | ~2.2 GB |
| Q5_K_M | 1.2 GB | 807 MB | ~2.0 GB |
| Q4_K_M | 1.0 GB | 807 MB | ~1.8 GB |
| Q3_K_M | 0.8 GB | 807 MB | ~1.6 GB |
| Q2_K | 0.6 GB | 807 MB | ~1.4 GB |
The mmproj file contains 443 tensors:
Key tensor mappings from original model:
| Original | GGUF |
|---|---|
vision_tower.vision_model.encoder.layers.X.* |
v.blk.X.* |
multi_modal_projector.layers.1.* |
mm.input_norm.* |
multi_modal_projector.layers.2.* |
mm.1.* |
multi_modal_projector.layers.4.* |
mm.2.* |
This model inherits the license from the original Heron-NVILA-Lite-2B model. Please refer to turing-motors/Heron-NVILA-Lite-2B-hf for license details.