Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

137

Full-text search

Active filters: vLLM

QuantTrio/GLM-4.5V-AWQ

Image-Text-to-Text • 17B • Updated Aug 25, 2025 • 295 • 19

QuantTrio/Seed-OSS-36B-Instruct-AWQ

Text Generation • 36B • Updated Sep 15, 2025 • 254 • 8

QuantTrio/Seed-OSS-36B-Instruct-GPTQ-Int8

Text Generation • 36B • Updated Sep 15, 2025 • 136 • 4

QuantTrio/Seed-OSS-36B-Instruct-GPTQ-Int4

Text Generation • 36B • Updated Sep 15, 2025 • 38 • 5

QuantTrio/Seed-OSS-36B-Instruct-GPTQ-Int3

Text Generation • 34B • Updated Sep 15, 2025 • 4 • 3

amakhov/tiny-random-llama

Text Generation • 4.18M • Updated Aug 21, 2025 • 7

QuantTrio/KAT-V1-40B-AWQ

Text Generation • 41B • Updated Sep 5, 2025 • 2 • 2

QuantTrio/DeepSeek-V3.1-AWQ

Text Generation • 485B • Updated Aug 27, 2025 • 856 • 5

QuantTrio/DeepSeek-V3.1-AWQ-Fp16Mix

Text Generation • 286B • Updated Aug 27, 2025 • 7 • 1

QuantTrio/DeepSeek-V3.1-AWQ-Lite

Text Generation • 684B • Updated Sep 5, 2025 • 170 • 3

JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int4

Text Generation • 4B • Updated Sep 4, 2025 • 75.4k • 2

JunHowie/Qwen3-4B-Instruct-2507-GPTQ-Int8

Text Generation • 4B • Updated Sep 4, 2025 • 12

JunHowie/Qwen3-4B-Thinking-2507-GPTQ-Int4

Text Generation • 4B • Updated Sep 4, 2025 • 1.28k • 1

JunHowie/Qwen3-4B-Thinking-2507-GPTQ-Int8

Text Generation • 4B • Updated Sep 4, 2025 • 147 • 2

JunHowie/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int4

Text Generation • 31B • Updated Sep 8, 2025 • 1.88k

JunHowie/Qwen3-30B-A3B-Instruct-2507-GPTQ-Int8

Text Generation • 31B • Updated Sep 8, 2025 • 1

JunHowie/Qwen3-30B-A3B-Thinking-2507-GPTQ-Int4

Text Generation • 31B • Updated Sep 8, 2025 • 93

JunHowie/Qwen2-7B-Instruct-GPTQ-Int4

Text Generation • 8B • Updated Sep 3, 2025 • 1

JunHowie/Qwen2-7B-Instruct-GPTQ-Int8

Text Generation • 8B • Updated Sep 3, 2025 • 2

EliovpAI/Deepseek-R1-0528-Qwen3-8B-FP8-KV

Text Generation • 8B • Updated Sep 18, 2025

JunHowie/Qwen3-30B-A3B-Thinking-2507-GPTQ-Int8

Text Generation • 31B • Updated Sep 8, 2025

JunHowie/Seed-OSS-36B-Instruct-GPTQ-Int4

Text Generation • 36B • Updated Sep 15, 2025 • 1

JunHowie/Seed-OSS-36B-Instruct-GPTQ-Int8

Text Generation • 36B • Updated Sep 15, 2025 • 1

QuantTrio/Qwen3-VL-235B-A22B-Instruct-AWQ

Text Generation • 236B • Updated Oct 8, 2025 • 2.48k • 13

QuantTrio/Qwen3-VL-235B-A22B-Instruct-FP8

Text Generation • Updated Oct 8, 2025 • 31

QuantTrio/Qwen3-VL-235B-A22B-Thinking-AWQ

Text Generation • 236B • Updated Oct 8, 2025 • 662 • 8

QuantTrio/Qwen3-VL-235B-A22B-Thinking-FP8

Text Generation • 236B • Updated Oct 8, 2025 • 29

QuantTrio/DeepSeek-V3.2-Exp-AWQ

Text Generation • 486B • Updated Oct 1, 2025 • 57 • 4

QuantTrio/DeepSeek-V3.2-Exp-AWQ-Lite

Text Generation • 685B • Updated Oct 1, 2025 • 50 • 4

QuantTrio/GLM-4.6-AWQ

Text Generation • 50B • Updated Oct 2, 2025 • 140 • 5