Moderation Toolkit
Collection
Tame the wild west!
•
2 items
•
Updated
This model Otilde/gpt-oss-safeguard-20b-MXFP4-Q4-MLX was converted to MLX format from openai/gpt-oss-safeguard-20b using mlx-lm version 0.28.2.
pip install mlx-lm
from mlx_lm import load, generate
model, tokenizer = load("Otilde/gpt-oss-safeguard-20b-MXFP4-Q4-MLX")
prompt = "hello"
if tokenizer.chat_template is not None:
messages = [{"role": "user", "content": prompt}]
prompt = tokenizer.apply_chat_template(
messages, add_generation_prompt=True
)
response = generate(model, tokenizer, prompt=prompt, verbose=True)
Running warmup..
Timing with prompt_tokens=512, generation_tokens=1024, batch_size=1.
Trial 1: prompt_tps=436.297, generation_tps=67.886, peak_memory=11.702
Trial 2: prompt_tps=429.775, generation_tps=66.781, peak_memory=11.703
Trial 3: prompt_tps=434.135, generation_tps=57.611, peak_memory=11.703
Trial 4: prompt_tps=140.279, generation_tps=4.825, peak_memory=11.704
Trial 5: prompt_tps=65.847, generation_tps=6.404, peak_memory=11.704
Averages: prompt_tps=301.266, generation_tps=40.701, peak_memory=11.703