MagicQuant GGUF Hybrids - Qwen3 4B Thinking 2507

MagicQuant is an automated quantization, benchmarking, and evolutionary hybrid-GGUF search system for LLMs.

Each release includes models optimized to outperform standard baseline quants (Q8, Q6, Q5, Q4). If a baseline GGUF exists in this repo, the evolutionary engine couldn’t beat it. If a baseline is missing, it’s because a hybrid configuration outperformed it so completely that including the baseline would've been pointless.

These hybrid GGUFs are built to be as small, fast, and low-drift as possible while preserving model capability.

To dive deeper into how MagicQuant works, see the main repo: MagicQuant on GitHub (by MagicCodingMan)

Notes:

  • The HuggingFace hardware compatibility where it shows the bits is usually wrong. It doesn't understand hybrid mixes, so don't trust it.
  • Naming scheme can be found on the MagicQuant Wiki.
  • (tips) Less precision loss means less brain damage. More TPS means faster! Smaller is always better right?

Precision Loss Guide

  • 0–0.1% → God-tier, scientifically exact
  • 0.1–1% → True near-lossless, agent-ready
  • 1–3% → Minimal loss, great for personal use
  • 3–5% → Borderline, but still functional
  • 5%+ → Toys, not tools, outside MagicQuant’s scope

Learn more about precision loss here.

Table - File Size + TPS + Avg Precision Loss

model_name file_size_gb bench_tps avg_prec_loss
mxfp4_moe-O-Q6K-EQKUD-Q8_0 3.90 369.09 0.0989%
mxfp4_moe-Q-Q5K-EKOUD-Q6K 3.03 394.06 0.1278%
iq4_nl-EQKOUD-Q6K 3.08 413.99 0.1740%
mxfp4_moe-QK-IQ4NL-O-MXFP4-EUD-Q6K 2.84 430.23 0.3832%
Q5_K 2.69 375.72 0.5973%
Q4_K_M 2.33 366.54 1.6668%
mxfp4_moe-QKU-IQ4NL-O-MXFP4-D-Q5K-E-Q6K 2.30 412.13 2.2740%
IQ4_NL 2.23 450.75 2.4657%
mxfp4_moe-EQOU-IQ4NL-KD-Q6K 2.37 472.25 2.5049%

Table - PPL Columns

model_name gen gen_er code code_er math math_er
mxfp4_moe-O-Q6K-EQKUD-Q8_0 10.0081 0.2450 1.5936 0.0128 6.9001 0.1413
mxfp4_moe-Q-Q5K-EKOUD-Q6K 9.9957 0.2441 1.5922 0.0127 6.9036 0.1412
iq4_nl-EQKOUD-Q6K 9.9687 0.2431 1.5927 0.0127 6.8924 0.1409
mxfp4_moe-QK-IQ4NL-O-MXFP4-EUD-Q6K 10.0858 0.2460 1.5949 0.0126 6.9032 0.1403
Q5_K 10.0993 0.2473 1.5978 0.0128 6.9256 0.1413
Q4_K_M 10.3239 0.2536 1.6093 0.0129 6.9423 0.1412
mxfp4_moe-QKU-IQ4NL-O-MXFP4-D-Q5K-E-Q6K 10.4164 0.2569 1.6143 0.0130 6.9825 0.1423
IQ4_NL 10.3718 0.2548 1.6125 0.0129 7.0606 0.1452
mxfp4_moe-EQOU-IQ4NL-KD-Q6K 10.3780 0.2547 1.6178 0.0132 7.0415 0.1443

Table - Precision Loss Columns

model_name loss_general loss_code loss_math
mxfp4_moe-O-Q6K-EQKUD-Q8_0 0.0250 0.1194 0.1524
mxfp4_moe-Q-Q5K-EKOUD-Q6K 0.1488 0.0314 0.2032
iq4_nl-EQKOUD-Q6K 0.4186 0.0628 0.0406
mxfp4_moe-QK-IQ4NL-O-MXFP4-EUD-Q6K 0.7512 0.2010 0.1974
Q5_K 0.8861 0.3832 0.5225
Q4_K_M 3.1297 1.1057 0.7649
mxfp4_moe-QKU-IQ4NL-O-MXFP4-D-Q5K-E-Q6K 4.0537 1.4199 1.3484
IQ4_NL 3.6082 1.3068 2.4820
mxfp4_moe-EQOU-IQ4NL-KD-Q6K 3.6701 1.6398 2.2048

Baseline Models (Reference)

Table - File Size + TPS + Avg Precision Loss

model_name file_size_gb bench_tps avg_prec_loss
BF16 7.50 249.86 0.0000%
Q8_0 3.99 360.78 0.1028%
Q6_K 3.08 404.72 0.1740%
Q5_K 2.69 375.72 0.5973%
Q4_K_M 2.33 366.54 1.6668%
IQ4_NL 2.23 450.75 2.4657%
MXFP4_MOE 2.00 466.66 7.9498%

Table - PPL Columns

model_name gen gen_er code code_er math math_er
BF16 10.0106 0.2451 1.5917 0.0127 6.8896 0.1410
Q8_0 10.0174 0.2454 1.5931 0.0128 6.9001 0.1413
Q6_K 9.9687 0.2431 1.5927 0.0127 6.8924 0.1409
Q5_K 10.0993 0.2473 1.5978 0.0128 6.9256 0.1413
Q4_K_M 10.3239 0.2536 1.6093 0.0129 6.9423 0.1412
IQ4_NL 10.3718 0.2548 1.6125 0.0129 7.0606 0.1452
MXFP4_MOE 10.9465 0.2659 1.6645 0.0138 7.5735 0.1563

Table - Precision Loss Columns

model_name loss_general loss_code loss_math
BF16 0.0000 0.0000 0.0000
Q8_0 0.0679 0.0880 0.1524
Q6_K 0.4186 0.0628 0.0406
Q5_K 0.8861 0.3832 0.5225
Q4_K_M 3.1297 1.1057 0.7649
IQ4_NL 3.6082 1.3068 2.4820
MXFP4_MOE 9.3491 4.5737 9.9266

Support

I’m a solo developer working full time for myself to achieve my dream, pouring nights and weekends into open protocols and tools that I hope make the world a little better. If you chip in, you're helping me keep the lights on while I keep shipping.

Click here to see ways to support - BTC, Paypal, GitHub sponsors.

Or, just drop a like on the repo :)

Downloads last month
884
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

4-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for magiccodingman/Qwen3-4B-Thinking-2507-Unsloth-MagicQuant-Hybrid-GGUF

Quantized
(4)
this model

Collection including magiccodingman/Qwen3-4B-Thinking-2507-Unsloth-MagicQuant-Hybrid-GGUF