dropbox-dash
/

Qwen2.5-VL-3B-Instruct_4bitgs64_hqq_hf

Text Generation

Model card Files Files and versions

mobicham commited on Apr 23

Commit

e6f4532

·

verified ·

1 Parent(s): 0573dab

Update README.md

Files changed (1) hide show

README.md +8 -0

README.md CHANGED Viewed

@@ -38,10 +38,18 @@ prepare_for_inference(model, backend=backend, verbose=True)
 ```
 Use in <a href="https://github.com/vllm-project/vllm/">vllm</a>:
 ```Python
 from vllm import LLM
 from vllm.sampling_params import SamplingParams
 model_id = "mobiuslabsgmbh/Qwen2.5-VL-3B-Instruct_4bitgs64_hqq_hf"
 llm = LLM(model=model_id, max_model_len=4096, max_num_seqs=2, limit_mm_per_prompt={"image": 1}, dtype=torch.float16)

 ```
 Use in <a href="https://github.com/vllm-project/vllm/">vllm</a>:
+```
+pip install git+https://github.com/mobiusml/hqq/;
+pip install git+https://github.com/mobiusml/gemlite/;
+```
 ```Python
 from vllm import LLM
 from vllm.sampling_params import SamplingParams
+from hqq.utils.vllm import set_vllm_hqq_backend, VLLM_HQQ_BACKEND
+set_vllm_hqq_backend(backend=VLLM_HQQ_BACKEND.GEMLITE)
 model_id = "mobiuslabsgmbh/Qwen2.5-VL-3B-Instruct_4bitgs64_hqq_hf"
 llm = LLM(model=model_id, max_model_len=4096, max_num_seqs=2, limit_mm_per_prompt={"image": 1}, dtype=torch.float16)