Having trouble running
#1
by
ramendik
- opened
Hello,
Tried this on modal - vllm wheels had to be built as documented in the model card, and it was pretty hard to get the exact version of xformers that vllm git wanted.
But the only time I successfully got to inference, all I was getting was !!!!!!!
EDIT: I did work out getting the custom vllm to run more reliably on modal. The inference still returns !!!!!! so somehow the quantization does not work.
Hi @ramendik , thank you for trying my model.
My apologies for the late responses. Could you try the latest vllm commits by the following clis?
git clone https://github.com/vllm-project/vllm.git
cd vllm
# Install vllm without dependencies
VLLM_USE_PRECOMPILED=1 pip install --no-deps .
# Install all other requirements except xformers
pip install -r requirements/common.txt
pip install numba==0.61.2
pip install "ray[cgraph]>=2.48.0"
pip install torch==2.9.0
pip install torchaudio==2.9.0
pip install torchvision==0.24.0
pip install flashinfer-python==0.4.1
# Install xformers WITHOUT its dependencies to prevent version changes
pip install --no-deps xformers==0.0.33.dev1090