Instructions to use yodi/karina with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use yodi/karina with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="yodi/karina")# Load model directly from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("yodi/karina") model = AutoModelForCausalLM.from_pretrained("yodi/karina") - Notebooks
- Google Colab
- Kaggle
- Local Apps
- vLLM
How to use yodi/karina with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "yodi/karina" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yodi/karina", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/yodi/karina
- SGLang
How to use yodi/karina with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "yodi/karina" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yodi/karina", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "yodi/karina" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "yodi/karina", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use yodi/karina with Docker Model Runner:
docker model run hf.co/yodi/karina
Table of Contents
Model Summary
We present KARINA, finetuned from BLOOMZ bigscience/bloomz-3b, a family of models capable of following human instructions in dozens of languages zero-shot. We finetune BLOOMZ pretrained multilingual language models on our crosslingual task mixture (xP3) and find the resulting models capable of crosslingual generalization to unseen tasks & languages.
Use
Intended use
We recommend using the model to perform tasks expressed in natural language. For example, given the prompt "prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"", the model will most likely answer "Saya Karina. Ada yang bisa saya bantu?".
How to use
CPU
Click to expand
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
MODEL_NAME = "yodi/karina"
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
model = AutoModelForCausalLM.from_pretrained(MODEL_NAME)
inputs = tokenizer.encode("Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n", return_tensors="pt")
outputs = model.generate(inputs)
print(tokenizer.decode(outputs[0]))
GPU in 4 bit
Click to expand
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline
MODEL_NAME = "yodi/karina"
model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"
generator = pipeline('text-generation',
model=model_4bit,
tokenizer=tokenizer,
do_sample=False)
result = generator(prompt, max_length=256)
print(result)
GPU in 8bit
Click to expand
# pip install -q transformers
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline
MODEL_NAME = "yodi/karina"
model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_8bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
prompt = f"Given the question:\n{{ siapa kamu? }}\n---\nAnswer:\n"
generator = pipeline('text-generation',
model=model_4bit,
tokenizer=tokenizer,
do_sample=False)
result = generator(prompt, max_length=256)
print(result)
[{'generated_text': 'Given the question:\n{ siapa kamu? }\n---\nAnswer:\nSaya Karina, asisten virtual siap membantu seputar estimasi harga atau pertanyaan lain'}]
Infer in Local with Gradio
from transformers import AutoModelForCausalLM, AutoTokenizer
from transformers import pipeline
import re
import gradio as gr
MODEL_NAME = "yodi/karina"
model_4bit = AutoModelForCausalLM.from_pretrained(MODEL_NAME, device_map="cuda:1", load_in_4bit=True)
tokenizer = AutoTokenizer.from_pretrained(MODEL_NAME)
generator = pipeline('text-generation',
model=model_4bit,
tokenizer=tokenizer,
do_sample=False)
def preprocess(text):
return f"Given the question:\n{{ {text} }}\n---\nAnswer:\n"
def generate(text):
preprocess_result = preprocess(text)
result = generator(preprocess_result, max_length=256)
output = re.split(r'\n---\nAnswer:\n',result[0]['generated_text'])[1]
return output
with gr.Blocks() as demo:
input_text = gr.Textbox(label="Input", lines=1)
button = gr.Button("Submit")
output_text = gr.Textbox(lines=6, label="Output")
button.click(generate, inputs=[input_text], outputs=output_text)
demo.launch(enable_queue=True, debug=True)
And open the gradio url from browser.
Training procedure
The following bitsandbytes quantization config was used during training:
- load_in_8bit: False
- load_in_4bit: True
- llm_int8_threshold: 6.0
- llm_int8_skip_modules: None
- llm_int8_enable_fp32_cpu_offload: False
- llm_int8_has_fp16_weight: False
- bnb_4bit_quant_type: nf4
- bnb_4bit_use_double_quant: True
- bnb_4bit_compute_dtype: float16
Framework versions
- PEFT 0.5.0.dev0
Limitations
Prompt Engineering: The performance may vary depending on the prompt and its following BLOOMZ models.
Training
Model
- Architecture: Same as bloom, also refer to the
config.jsonfile
- Downloads last month
- 9