Instructions to use OpenGVLab/InternVL2-Llama3-76B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use OpenGVLab/InternVL2-Llama3-76B with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL2-Llama3-76B", trust_remote_code=True) messages = [ { "role": "user", "content": [ {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"}, {"type": "text", "text": "What animal is on the candy?"} ] }, ] pipe(text=messages)# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("OpenGVLab/InternVL2-Llama3-76B", trust_remote_code=True, dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use OpenGVLab/InternVL2-Llama3-76B with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "OpenGVLab/InternVL2-Llama3-76B" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2-Llama3-76B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker
docker model run hf.co/OpenGVLab/InternVL2-Llama3-76B
- SGLang
How to use OpenGVLab/InternVL2-Llama3-76B with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "OpenGVLab/InternVL2-Llama3-76B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2-Llama3-76B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "OpenGVLab/InternVL2-Llama3-76B" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/chat/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "OpenGVLab/InternVL2-Llama3-76B", "messages": [ { "role": "user", "content": [ { "type": "text", "text": "Describe this image in one sentence." }, { "type": "image_url", "image_url": { "url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg" } } ] } ] }' - Docker Model Runner
How to use OpenGVLab/InternVL2-Llama3-76B with Docker Model Runner:
docker model run hf.co/OpenGVLab/InternVL2-Llama3-76B
Diminishing returns over size
Hey guys,
I love your work. I have been using your model since yesterday and am very impressed. Thanks for making it open source! I see that even the smaller models perform quite well and then the bigger models do not perform that crazily, meaning that the models do perform well but you see very small improvements, which of course are still remarkable.
I noticed that whenever there is a lot of information/text in an image, it struggles a bit, which I think is because of the small resolution size of the images even in bigger models, which I assume is the same everywhere. If I can advise you, please train a model that can take upto a mega pixel, like in a curriculum learning kinda scenario where the model works well now but then maybe extend it at the end of training for 4 times the pixels (double the size), this way I think you can kill it in general OCR, I am pretty sure.
But regardless, this is amazing and thanks for contributing to the society!
Best regards,
HD
Hello!
Thank you for your interest. I would like to ask how you are running our model, if you're using the code in quick-start, there is a max_num parameter that can be used to adjust image resolution. The default value is 6, which means that the maximum resolution of the input image is 6x448x448, for example, 896x1344.
If you're using these models on our online demo, you can adjust max_input_tiles in the Advanced Options sidebar on the left side. You can set it to 24, which means that the input resolution has at most 24x448x448 pixels, or about 4.8 million pixels.
I hope this may help you.
Best regards,
Zhe Chen
I was using the default max_num=6 values and now as soon as I pumped it to 12, the results became much better! Thanks again main! Swift response!
