paulpak58 commited on
Commit
19a7cba
·
verified ·
1 Parent(s): 4846f70

Update README.md

Browse files
Files changed (1) hide show
  1. README.md +51 -1
README.md CHANGED
@@ -205,7 +205,57 @@ You can directly run and test the model with this [Colab notebook](https://colab
205
 
206
  ### 2. vLLM
207
 
208
- vLLM support is coming soon!
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
209
 
210
  ### 3. llama.cpp
211
 
 
205
 
206
  ### 2. vLLM
207
 
208
+ You can run the model in [`vLLM`](https://github.com/vllm-project/vllm) by building from source:
209
+
210
+ ```bash
211
+ git clone https://github.com/vllm-project/vllm.git
212
+ cd vllm
213
+ pip install -e . -v
214
+ ```
215
+
216
+ Here is an example of how to use it for inference:
217
+
218
+ ```python
219
+ from vllm import LLM, SamplingParams
220
+
221
+ prompts = [
222
+ [
223
+ {
224
+ "content": "What is C. elegans?",
225
+ "role": "user",
226
+ },
227
+ ],
228
+ [
229
+ {
230
+ "content": "Say hi in JSON format",
231
+ "role": "user",
232
+ },
233
+ ],
234
+ [
235
+ {
236
+ "content": "Define AI in Spanish",
237
+ "role": "user",
238
+ },
239
+ ],
240
+ ]
241
+
242
+ sampling_params = SamplingParams(
243
+ temperature=0.3,
244
+ min_p=0.15,
245
+ repetition_penalty=1.05,
246
+ max_tokens=30
247
+ )
248
+
249
+ llm = LLM(model="LiquidAI/LFM2-8B-A1B", dtype="bfloat16")
250
+
251
+ outputs = llm.chat(prompts, sampling_params)
252
+
253
+ for i, output in enumerate(outputs):
254
+ prompt = prompts[i][0]["content"]
255
+ generated_text = output.outputs[0].text
256
+ print(f"Prompt: {prompt!r}, Generated text: {generated_text!r}")
257
+ ```
258
+
259
 
260
  ### 3. llama.cpp
261