-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.03M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 23 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 37 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 16
Inference Optimization
community
AI & ML interests
None defined yet.
Recent Activity
View all activity
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.19M • • 5.75k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 45.4k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.7k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
-
meta-llama/Llama-3.2-1B-Instruct
Text Generation • 1B • Updated • 5.03M • • 1.38k -
inference-optimization/Llama-3.2-1B-Instruct-FP8-Dynamic
1B • Updated • 23 -
inference-optimization/Llama-3.2-1B-Instruct-NVFP4
0.8B • Updated • 37 -
inference-optimization/Llama-3.2-1B-Instruct-5-bits-mode-heuristic-per-tensor
1B • Updated • 16
Mixed Precision Models
-
meta-llama/Llama-3.1-8B-Instruct
Text Generation • 8B • Updated • 9.19M • • 5.75k -
RedHatAI/Meta-Llama-3.1-8B-Instruct-FP8-dynamic
Text Generation • 8B • Updated • 45.4k • 9 -
RedHatAI/Llama-3.1-8B-Instruct-NVFP4
Text Generation • 5B • Updated • 19.7k • 1 -
inference-optimization/Llama-3.1-8B-Instruct_5_bits_mode_hybrid
6B • Updated • 11
models 319
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-noise-per-tensor
26B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-hybrid-per-tensor
27B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-7-bits-mode-heuristic-per-tensor
27B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-noise-per-tensor
25B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-hybrid-per-tensor
25B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6.5-bits-mode-heuristic-per-tensor
25B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-noise-per-tensor
23B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-hybrid-per-tensor
23B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-6-bits-mode-heuristic-per-tensor
23B • Updated
inference-optimization/Qwen3-30B-A3B-Instruct-2507-5.5-bits-mode-noise-per-tensor
21B • Updated
datasets 13
inference-optimization/laguna-xs-ultrachat-responses
Viewer • Updated • 208k
inference-optimization/laguna-xs-ultrachat-conversations
Viewer • Updated • 205k
inference-optimization/laguna-xs-magpie-300k-responses
Viewer • Updated • 300k
inference-optimization/laguna-xs-magpie-300k-conversations
Viewer • Updated • 298k
inference-optimization/Qwen3-8b-sharegpt-5k
Preview • Updated • 81
inference-optimization/speculators_benchmarks_tool_call
Viewer • Updated • 4.9k • 63
inference-optimization/speculators-qwen3-30b-a3b-instruct-2507
Preview • Updated • 32
inference-optimization/speculators-qwen3-30b-a3b-instruct
Preview • Updated • 58
inference-optimization/speculators-qwen3-32b-instruct
Preview • Updated • 65
inference-optimization/gpt-oss-20b-nan-hidden-states-repro
Updated • 52