Video-Text-to-Text
Transformers
Safetensors
English
molmo2
image-text-to-text
multimodal
olmo
molmo
custom_code
4-bit precision
bitsandbytes
Instructions to use Cycl0/Molmo2-VideoPoint-4B-bnb-4bit with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use Cycl0/Molmo2-VideoPoint-4B-bnb-4bit with Transformers:
# Load model directly from transformers import AutoModelForImageTextToText model = AutoModelForImageTextToText.from_pretrained("Cycl0/Molmo2-VideoPoint-4B-bnb-4bit", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
| { | |
| "auto_map": { | |
| "AutoImageProcessor": "image_processing_molmo2.Molmo2ImageProcessor", | |
| "AutoProcessor": "processing_molmo2.Molmo2Processor" | |
| }, | |
| "do_convert_rgb": true, | |
| "image_mean": [ | |
| 0.5, | |
| 0.5, | |
| 0.5 | |
| ], | |
| "image_processor_type": "Molmo2ImageProcessor", | |
| "image_std": [ | |
| 0.5, | |
| 0.5, | |
| 0.5 | |
| ], | |
| "max_crops": 8, | |
| "overlap_margins": [ | |
| 4, | |
| 4 | |
| ], | |
| "patch_size": 14, | |
| "pooling_size": [ | |
| 2, | |
| 2 | |
| ], | |
| "processor_class": "Molmo2Processor", | |
| "resample": 2, | |
| "size": { | |
| "height": 378, | |
| "width": 378 | |
| } | |
| } | |