Qwen2.5-0.5B-Instruct (MLX, 8-bit)

This repository contains an MLX-converted and 8-bit quantized version of Qwen/Qwen2.5-0.5B-Instruct.

  • No fine-tuning or training was performed
  • Format conversion + post-training quantization only
  • 8-bit prioritizes output stability and quality

Usage

pip install -U mlx-lm
mlx_lm.generate \
  --model Irfanuruchi/Qwen2.5-0.5B-Instruct-MLX-8bit \
  --prompt "Write a helpful onboarding message for an iOS app in 3 bullet points."

Bench notes (MacBook Pro M3 Pro)

  • Prompt tokens: 45
  • Generation tokens: 100
  • Generation speed: ~192.9 tokens/sec
  • Peak memory: ~0.565 GB

Tooling

  • mlx-lm: 0.30.2
  • mlx: bundled with Apple MLX (no public version string)

Related models

Downloads last month
12
Safetensors
Model size
0.1B params
Tensor type
BF16
·
U32
·
MLX
Hardware compatibility
Log In to add your hardware

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for Irfanuruchi/Qwen2.5-0.5B-Instruct-MLX-8bit

Base model

Qwen/Qwen2.5-0.5B
Quantized
(168)
this model