Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

33

Full-text search

Active filters: nm-vllm

RedHatAI/TinyLlama-1.1B-Chat-v1.0-pruned2.4

Text Generation • Updated Mar 5, 2024 • 20 • 1

RedHatAI/MiniChat-2-3B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 4

RedHatAI/OpenHermes-2.5-Mistral-7B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 9

RedHatAI/OpenHermes-2.5-Mistral-7B-pruned50

Text Generation • Updated Mar 5, 2024 • 39 • 1

RedHatAI/Nous-Hermes-2-SOLAR-10.7B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 1

RedHatAI/Nous-Hermes-2-Yi-34B-pruned2.4

Text Generation • Updated Mar 5, 2024 • 2

RedHatAI/Nous-Hermes-2-Yi-34B-pruned50

Text Generation • Updated Mar 5, 2024 • 1

RedHatAI/zephyr-7b-beta-marlin

Text Generation • 1B • Updated Mar 6, 2024 • 23

RedHatAI/llama2.c-stories110M-pruned2.4

Text Generation • Updated Mar 5, 2024 • 5

RedHatAI/llama2.c-stories110M-pruned50

Text Generation • Updated Mar 5, 2024 • 1.05k

RedHatAI/phi-2-pruned50

Text Generation • 3B • Updated Mar 5, 2024 • 1

RedHatAI/TinyLlama-1.1B-Chat-v1.0-marlin

Text Generation • 0.3B • Updated Mar 6, 2024 • 316 • 2

RedHatAI/OpenHermes-2.5-Mistral-7B-marlin

Text Generation • 1B • Updated Mar 6, 2024 • 90 • 2

RedHatAI/Nous-Hermes-2-Yi-34B-marlin

Text Generation • 5B • Updated Mar 6, 2024 • 3 • 5

softmax/Llama-2-70b-chat-hf-marlin

Text Generation • 10B • Updated Mar 17, 2024 • 1

softmax/falcon-180B-chat-marlin

Text Generation • 26B • Updated Mar 21, 2024 • 3

dtransposed/llama2.c-stories110M-pruned50-compressed-tensors

Text Generation • Updated Apr 23, 2024 • 3

mradermacher/Nous-Hermes-2-SOLAR-10.7B-pruned2.4-GGUF

11B • Updated Apr 10, 2025 • 95

mradermacher/Nous-Hermes-2-SOLAR-10.7B-pruned2.4-i1-GGUF

11B • Updated Apr 10, 2025 • 258

tensorblock/llama2.c-stories110M-pruned50-GGUF

0.1B • Updated Jan 27 • 41

mradermacher/phi-2-pruned50-GGUF

3B • Updated Aug 1, 2025 • 169

mradermacher/llama2.c-stories110M-pruned50-GGUF

0.1B • Updated Apr 10, 2025 • 86

mradermacher/OpenHermes-2.5-Mistral-7B-pruned50-GGUF

7B • Updated Apr 10, 2025 • 31 • 1

mradermacher/MiniChat-2-3B-pruned2.4-GGUF

3B • Updated Apr 10, 2025 • 52

mradermacher/OpenHermes-2.5-Mistral-7B-pruned50-i1-GGUF

7B • Updated Apr 10, 2025 • 60

mradermacher/llama2.c-stories110M-pruned50-i1-GGUF

0.1B • Updated Apr 10, 2025 • 84

mradermacher/OpenHermes-2.5-Mistral-7B-pruned2.4-GGUF

7B • Updated Apr 10, 2025 • 46

mradermacher/OpenHermes-2.5-Mistral-7B-pruned2.4-i1-GGUF

7B • Updated Apr 10, 2025 • 93

tensorblock/OpenHermes-2.5-Mistral-7B-pruned2.4-GGUF

7B • Updated Jan 27 • 17

tensorblock/OpenHermes-2.5-Mistral-7B-pruned50-GGUF

7B • Updated Jan 27 • 13