Instructions to use OpenGVLab/InternVL2-Llama3-76B with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use OpenGVLab/InternVL2-Llama3-76B with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("image-text-to-text", model="OpenGVLab/InternVL2-Llama3-76B", trust_remote_code=True)
messages = [
    {
        "role": "user",
        "content": [
            {"type": "image", "url": "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/p-blog/candy.JPG"},
            {"type": "text", "text": "What animal is on the candy?"}
        ]
    },
]
pipe(text=messages)

# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("OpenGVLab/InternVL2-Llama3-76B", trust_remote_code=True, dtype="auto")

Notebooks
Google Colab
Kaggle
Local Apps

vLLM

How to use OpenGVLab/InternVL2-Llama3-76B with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "OpenGVLab/InternVL2-Llama3-76B"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-Llama3-76B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker

docker model run hf.co/OpenGVLab/InternVL2-Llama3-76B

SGLang

How to use OpenGVLab/InternVL2-Llama3-76B with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "OpenGVLab/InternVL2-Llama3-76B" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-Llama3-76B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "OpenGVLab/InternVL2-Llama3-76B" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "OpenGVLab/InternVL2-Llama3-76B",
		"messages": [
			{
				"role": "user",
				"content": [
					{
						"type": "text",
						"text": "Describe this image in one sentence."
					},
					{
						"type": "image_url",
						"image_url": {
							"url": "https://cdn.britannica.com/61/93061-050-99147DCE/Statue-of-Liberty-Island-New-York-Bay.jpg"
						}
					}
				]
			}
		]
	}'

Docker Model Runner
How to use OpenGVLab/InternVL2-Llama3-76B with Docker Model Runner:
```
docker model run hf.co/OpenGVLab/InternVL2-Llama3-76B
```

Diminishing returns over size

by HriDal - opened Jul 16, 2024

Discussion

HriDal

Jul 16, 2024

Hey guys,
I love your work. I have been using your model since yesterday and am very impressed. Thanks for making it open source! I see that even the smaller models perform quite well and then the bigger models do not perform that crazily, meaning that the models do perform well but you see very small improvements, which of course are still remarkable.

I noticed that whenever there is a lot of information/text in an image, it struggles a bit, which I think is because of the small resolution size of the images even in bigger models, which I assume is the same everywhere. If I can advise you, please train a model that can take upto a mega pixel, like in a curriculum learning kinda scenario where the model works well now but then maybe extend it at the end of training for 4 times the pixels (double the size), this way I think you can kill it in general OCR, I am pretty sure.

But regardless, this is amazing and thanks for contributing to the society!

Best regards,
HD

czczup

OpenGVLab org Jul 16, 2024

•

edited Jul 16, 2024

Hello!
Thank you for your interest. I would like to ask how you are running our model, if you're using the code in quick-start, there is a max_num parameter that can be used to adjust image resolution. The default value is 6, which means that the maximum resolution of the input image is 6x448x448, for example, 896x1344.

If you're using these models on our online demo, you can adjust max_input_tiles in the Advanced Options sidebar on the left side. You can set it to 24, which means that the input resolution has at most 24x448x448 pixels, or about 4.8 million pixels.

I hope this may help you.

Best regards,
Zhe Chen

HriDal

Jul 16, 2024

I was using the default max_num=6 values and now as soon as I pumped it to 12, the results became much better! Thanks again main! Swift response!

HriDal changed discussion status to closed Jul 16, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment