Heron-NVILA-Lite-2B GGUF (Q3_K_M)

This is the Q3_K_M quantized GGUF version of turing-motors/Heron-NVILA-Lite-2B-hf, a Vision-Language Model optimized for Japanese.

Model Details

Model Architecture

Component Details
Vision Encoder SigLIP (paligemma-siglip-so400m-patch14-448)
Projector mlp_downsample_2x2_fix
LLM Qwen2.5-1.5B
Total Parameters ~2B

Requirements

This model requires the mmproj (multimodal projector) file to process images:

Available Quantizations

Quantization File Size Use Case
F16 3.3 GB Maximum quality
Q8_0 1.8 GB High quality
Q6_K 1.4 GB Balanced
Q5_K_M 1.2 GB Balanced
Q4_K_M 1.1 GB Efficient
Q3_K_M 0.9 GB Compact
Q2_K 0.7 GB Minimum size

Usage with llama.cpp

# Download the model and mmproj
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-Q3_K_M-GGUF/resolve/main/Heron-NVILA-Lite-2B-Q3_K_M.gguf
wget https://huggingface.co/nawta/Heron-NVILA-Lite-2B-mmproj-GGUF/resolve/main/mmproj-heron-nvila-lite-2b-f16.gguf

# Run inference
./llama-mtmd-cli \
  -m Heron-NVILA-Lite-2B-Q3_K_M.gguf \
  --mmproj mmproj-heron-nvila-lite-2b-f16.gguf \
  --image your_image.jpg \
  -p "この画像について説明してください。"

Memory Requirements

Quantization LLM mmproj Total
F16 3.3 GB 807 MB ~4.1 GB
Q8_0 1.8 GB 807 MB ~2.6 GB
Q6_K 1.4 GB 807 MB ~2.2 GB
Q5_K_M 1.2 GB 807 MB ~2.0 GB
Q4_K_M 1.1 GB 807 MB ~1.9 GB
Q3_K_M 0.9 GB 807 MB ~1.7 GB
Q2_K 0.7 GB 807 MB ~1.5 GB

License

This model inherits the license from the original Heron-NVILA-Lite-2B model. Please refer to turing-motors/Heron-NVILA-Lite-2B-hf for license details.

Acknowledgments

Downloads last month
12
GGUF
Model size
2B params
Architecture
qwen2
Hardware compatibility
Log In to view the estimation

3-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for nawta/Heron-NVILA-Lite-2B-Q3_K_M-GGUF

Collection including nawta/Heron-NVILA-Lite-2B-Q3_K_M-GGUF