Heron-NVILA-Lite-2B mmproj GGUF

This is the multimodal projector (mmproj) GGUF file for turing-motors/Heron-NVILA-Lite-2B-hf, a Vision-Language Model optimized for Japanese.

What is mmproj?

The mmproj (multimodal projector) file contains the vision encoder and projector weights that are required to process images with the Heron VLM. This file is required in addition to the LLM GGUF file to perform vision-language tasks.

Model Details

  • Vision Encoder: SigLIP (paligemma-siglip-so400m-patch14-448)
  • Projector Type: mlp_downsample_2x2_fix (mapped to lfm2 in llama.cpp)
  • Image Size: 448x448
  • Patch Size: 14
  • Format: GGUF (F16)
  • File Size: 807 MB

Technical Details

Parameter Value
Vision Hidden Size 1152
Vision Intermediate Size 4304
Vision Attention Heads 16
Vision Layers 27
Projector Scale Factor 2 (2x2 downsampling)
Image Mean [0.5, 0.5, 0.5]
Image Std [0.5, 0.5, 0.5]

Required LLM Models

This mmproj file works with any of the quantized Heron LLM models:

Usage with llama.cpp

# Download the mmproj and an LLM model (e.g., Q4_K_M)
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-mmproj-GGUF/resolve/main/mmproj-heron-nvila-lite-2b-f16.gguf
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-Q4_K_M-GGUF/resolve/main/Heron-NVILA-Lite-2B-Q4_K_M.gguf

# Run inference
./llama-mtmd-cli \
  -m Heron-NVILA-Lite-2B-Q4_K_M.gguf \
  --mmproj mmproj-heron-nvila-lite-2b-f16.gguf \
  --image your_image.jpg \
  -p "ใ“ใฎ็”ปๅƒใซใคใ„ใฆ่ชฌๆ˜Žใ—ใฆใใ ใ•ใ„ใ€‚"

Memory Requirements

Total memory usage depends on the chosen LLM quantization:

LLM Quantization LLM mmproj Total
F16 3.3 GB 807 MB ~4.1 GB
Q8_0 1.8 GB 807 MB ~2.6 GB
Q6_K 1.4 GB 807 MB ~2.2 GB
Q5_K_M 1.2 GB 807 MB ~2.0 GB
Q4_K_M 1.0 GB 807 MB ~1.8 GB
Q3_K_M 0.8 GB 807 MB ~1.6 GB
Q2_K 0.6 GB 807 MB ~1.4 GB

Tensor Mapping

The mmproj file contains 443 tensors:

  • Vision encoder: 437 tensors (SigLIP architecture)
  • Projector: 6 tensors (LayerNorm + 2 Linear layers)

Key tensor mappings from original model:

Original GGUF
vision_tower.vision_model.encoder.layers.X.* v.blk.X.*
multi_modal_projector.layers.1.* mm.input_norm.*
multi_modal_projector.layers.2.* mm.1.*
multi_modal_projector.layers.4.* mm.2.*

License

This model inherits the license from the original Heron-NVILA-Lite-2B model. Please refer to turing-motors/Heron-NVILA-Lite-2B-hf for license details.

Acknowledgments

Downloads last month
82
GGUF
Hardware compatibility
Log In to view the estimation
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for nawta/Heron-NVILA-Lite-2B-mmproj-GGUF

Collection including nawta/Heron-NVILA-Lite-2B-mmproj-GGUF