Llama-3-Swallow-8B-Instruct-v0.1-kokoroe
Built with Meta Llama 3
The Llama-3-Swallow-8B-Instruct-v0.1-kokoroe is a large language model fine-tuned to follow instructions in Japanese, with safety tuning applied to enhance response appropriateness. This model is based on tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1.
Model Details
Model Description
- Developed by: Retrieva, Inc.
- Model type: Transformer-based Language Model (LlamaForCausalLM)
- Language(s) (NLP): Primarily Japanese
- License: llama3
- Finetuned from model: tokyotech-llm/Llama-3-Swallow-8B-Instruct-v0.1
Uses
This section describes three ways to use the model:
- Huggingface/transformers library
- vLLM library
huggingface/transformers Usage
import torch
from transformers import AutoTokenizer, AutoModelForCausalLM
model_id = "retrieva-jp/Llama-3-Swallow-8B-Instruct-v0.1-kokoroe"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(model_id, device_map="auto", torch_dtype=torch.bfloat16)
chat = [
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。"},
{"role": "user", "content": "自然言語処理とは何か"},
]
tokenized_input = tokenizer.apply_chat_template(chat, add_generation_prompt=True, tokenize=True, return_tensors="pt").to(model.device)
with torch.no_grad():
output = model.generate(
tokenized_input,
max_new_tokens=100,
do_sample=True,
top_p=0.9,
temperature=0.6,
)[0]
print(tokenizer.decode(output))
vLLM Usage
$ vllm serve --model retrieva-jp/Llama-3-Swallow-8B-Instruct-v0.1-kokoroe --port 8000
$ curl -X POST http://localhost:8000/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": [
{"role": "system", "content": "あなたは誠実で優秀な日本人のアシスタントです。"},
{"role": "user", "content": "自然言語処理とは何か"}
],
"max_new_tokens": 100,
"do_sample": true,
"top_p": 0.9,
"temperature": 0.6
}'
Model Card Authors
Satoru Katsumata
Model Card Contact
pr[at]retrieva.jp
- Downloads last month
- 35