Llama3 Models
Collection
10 items β’ Updated β’ 2
How to use AI-Sweden-Models/Llama-3-8B-instruct with Transformers:
# Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text-generation", model="AI-Sweden-Models/Llama-3-8B-instruct")
messages = [
{"role": "user", "content": "Who are you?"},
]
pipe(messages) # Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM
tokenizer = AutoTokenizer.from_pretrained("AI-Sweden-Models/Llama-3-8B-instruct")
model = AutoModelForCausalLM.from_pretrained("AI-Sweden-Models/Llama-3-8B-instruct")
messages = [
{"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
messages,
add_generation_prompt=True,
tokenize=True,
return_dict=True,
return_tensors="pt",
).to(model.device)
outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))How to use AI-Sweden-Models/Llama-3-8B-instruct with vLLM:
# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "AI-Sweden-Models/Llama-3-8B-instruct"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AI-Sweden-Models/Llama-3-8B-instruct",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker model run hf.co/AI-Sweden-Models/Llama-3-8B-instruct
How to use AI-Sweden-Models/Llama-3-8B-instruct with SGLang:
# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
--model-path "AI-Sweden-Models/Llama-3-8B-instruct" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AI-Sweden-Models/Llama-3-8B-instruct",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'docker run --gpus all \
--shm-size 32g \
-p 30000:30000 \
-v ~/.cache/huggingface:/root/.cache/huggingface \
--env "HF_TOKEN=<secret>" \
--ipc=host \
lmsysorg/sglang:latest \
python3 -m sglang.launch_server \
--model-path "AI-Sweden-Models/Llama-3-8B-instruct" \
--host 0.0.0.0 \
--port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
-H "Content-Type: application/json" \
--data '{
"model": "AI-Sweden-Models/Llama-3-8B-instruct",
"messages": [
{
"role": "user",
"content": "What is the capital of France?"
}
]
}'How to use AI-Sweden-Models/Llama-3-8B-instruct with Docker Model Runner:
docker model run hf.co/AI-Sweden-Models/Llama-3-8B-instruct
The training was perfomed on the LUMI supercomputer within the DeployAI EU project. Based of the base model AI-Sweden-Models/Llama-3-8B.
import transformers
import torch
model_id = "AI-Sweden-Models/Llama-3-8B-instruct"
pipeline = transformers.pipeline(
"text-generation",
model=model_id,
model_kwargs={"torch_dtype": torch.bfloat16},
device_map="auto",
)
messages = [
{"role": "system", "content": "Du Γ€r en hjΓ€lpsam assistant som svarar klokt och vΓ€nligt."},
{"role": "user", "content": "Hur gΓΆr man pannkakor? Och vad behΓΆver man handla? Undrar ocksΓ₯ vad 5+6 Γ€r.."},
]
terminators = [
pipeline.tokenizer.eos_token_id,
pipeline.tokenizer.convert_tokens_to_ids("<|eot_id|>")
]
outputs = pipeline(
messages,
max_new_tokens=256,
eos_token_id=terminators,
do_sample=True,
temperature=0.6,
top_p=0.9,
)
print(outputs[0]["generated_text"][-1])
>>> "FΓΆr att gΓΆra pannkakor behΓΆver du fΓΆljande ingredienser:
- 1 kopp vetemjΓΆl
- 1 tesked bakpulver
- 1/4 tesked salt
- 1 kopp mjΓΆlk
- 1 stort Γ€gg
- 2 matskedar smΓ€lt smΓΆr eller olja
FΓΆr att bΓΆrja, blanda vetemjΓΆlet, bakpulvret och saltet i en bunke. I en annan skΓ₯l, vispa ihop mjΓΆlken, Γ€gget och smΓΆret eller oljan.
TillsΓ€tt de vΓ₯ta ingredienserna till de torra ingredienserna och blanda tills det Γ€r vΓ€l blandat.
LΓ₯t smeten vila i cirka 10 minuter.
FΓΆr att gΓΆra pannkakorna, vΓ€rm en non-stick-panna eller stekpanna ΓΆver medelvΓ€rme.
NΓ€r den Γ€r varm, hΓ€ll smeten pΓ₯ pannan och grΓ€dda tills kanterna bΓΆrjar torka ut och toppen Γ€r fast.
VΓ€nd pannkakan med en stekspade och grΓ€dda den andra sidan tills den Γ€r gyllenbrun.
Upprepa med resten av smeten.
NΓ€r det gΓ€ller 5+6 Γ€r svaret 11."
Base model
meta-llama/Meta-Llama-3-8B
docker model run hf.co/AI-Sweden-Models/Llama-3-8B-instruct