Video-Text-to-Text
Transformers
Safetensors
MLX
English
molmo2
image-text-to-text
multimodal
olmo
molmo
custom_code
5-bit
Instructions to use mlx-community/Molmo2-8B-5bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use mlx-community/Molmo2-8B-5bit with Transformers:
# Load model directly from transformers import AutoModelForImageTextToText model = AutoModelForImageTextToText.from_pretrained("mlx-community/Molmo2-8B-5bit", trust_remote_code=True, dtype="auto") - MLX
How to use mlx-community/Molmo2-8B-5bit with MLX:
# Download the model from the Hub pip install huggingface_hub[hf_xet] huggingface-cli download --local-dir Molmo2-8B-5bit mlx-community/Molmo2-8B-5bit
- Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- LM Studio
metadata
license: apache-2.0
datasets:
- allenai/Molmo2-Cap
- allenai/Molmo2-VideoCapQA
- allenai/Molmo2-VideoSubtitleQA
- allenai/Molmo2-AskModelAnything
- allenai/Molmo2-VideoPoint
- allenai/Molmo2-VideoTrack
- allenai/Molmo2-MultiImageQA
- allenai/Molmo2-SynMultiImageQA
- allenai/Molmo2-MultiImagePoint
language:
- en
base_model:
- Qwen/Qwen3-8B
- google/siglip-so400m-patch14-384
pipeline_tag: video-text-to-text
library_name: transformers
tags:
- multimodal
- olmo
- molmo
- molmo2
- mlx
mlx-community/Molmo2-8B-5bit
This model was converted to MLX format from allenai/Molmo2-8B using mlx-vlm version 0.3.10.
Refer to the original model card for more details on the model.
Use with mlx
pip install -U mlx-vlm
python -m mlx_vlm.generate --model mlx-community/Molmo2-8B-5bit --max-tokens 100 --temperature 0.0 --prompt "Describe this image." --image <path_to_image>