Qwen2.5-0.5B-Instruct (MLX, 8-bit)

This repository contains an MLX-converted and 8-bit quantized version of Qwen/Qwen2.5-0.5B-Instruct.

No fine-tuning or training was performed
Format conversion + post-training quantization only
8-bit prioritizes output stability and quality

Usage

pip install -U mlx-lm

mlx_lm.generate \
  --model Irfanuruchi/Qwen2.5-0.5B-Instruct-MLX-8bit \
  --prompt "Write a helpful onboarding message for an iOS app in 3 bullet points."

Bench notes (MacBook Pro M3 Pro)

Prompt tokens: 45
Generation tokens: 100
Generation speed: ~192.9 tokens/sec
Peak memory: ~0.565 GB

Tooling

mlx-lm: 0.30.2
mlx: bundled with Apple MLX (no public version string)

Related models

4-bit variant (recommended default):
https://huggingface.co/Irfanuruchi/Qwen2.5-0.5B-Instruct-MLX-4bit

Downloads last month: 12

Safetensors

Model size

0.1B params

Tensor type

BF16

U32

MLX

Hardware compatibility

8-bit

Model tree for Irfanuruchi/Qwen2.5-0.5B-Instruct-MLX-8bit

Base model

Qwen/Qwen2.5-0.5B

Finetuned

Qwen/Qwen2.5-0.5B-Instruct

Quantized

(168)

this model