-
-
-
-
-
-
Inference Providers
Active filters: GRPO
alfredcs/gemma-3-27b-grpo-med-merged
Image-Text-to-Text
• Updated
alfredcs/gemma-3-27b-firstaid-icd10-merged
Image-Text-to-Text
• Updated
mradermacher/gemma-3-27b-firstaid-icd10-merged-GGUF
28B • Updated
• 43
jinlovespho/SmolGRPO-135M
Text Generation
• 0.1B • Updated
• 2
Sarahpa/spGRPO-135M-readability-2
Text Generation
• 0.1B • Updated
• 2
Text Generation
• 0.1B • Updated
• 1
tariktuna/Summarizer-Demo-SmolGRPO-135M
Text Generation
• 0.1B • Updated
Text Generation
• 0.1B • Updated
supermodelresearch/VAR-d16-GRPO-Aesthetic
Text-to-Image
• Updated
supermodelresearch/VAR-d30-GRPO-Aesthetic
Text-to-Image
• Updated
dzungever/SmolLM-135M-Instruct-GRPO
Text Generation
• 0.1B • Updated
ritwik098/SmolGRPO-360M-Ritwik
Text Generation
• 0.4B • Updated
• 1
Text Generation
• 0.1B • Updated
• 1
alfredcs/torchrun-medgemma-27b-grpo-merged
Image-Text-to-Text
• 27B • Updated
KhushalM/Qwen2.5-1.5B-GRPO-Complete
Text Generation
• 2B • Updated
Text Generation
• 0.1B • Updated
• 5
Mhammad2023/SmolGRPO-135M
Text Generation
• 0.1B • Updated
• 8
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
• 1
Text Generation
• 0.1B • Updated
• 4
mlx-community/VisualQuality-R1-7B-bf16
Reinforcement Learning
• Updated
• 7
mlx-community/VisualQuality-R1-7B-6bit
Reinforcement Learning
• Updated
• 7
mlx-community/VisualQuality-R1-7B-8bit
Reinforcement Learning
• Updated
• 7
mlx-community/VisualQuality-R1-7B-4bit
Reinforcement Learning
• Updated
• 15
• 1
Text Generation
• 0.1B • Updated
• 1
kavanmevada/SmolGRPO-135M
Text Generation
• 0.6B • Updated
• 1
kavanmevada/SmolGRPO-135M-adapter
Updated
Text Generation
• 0.1B • Updated
• 3
harikrushna2272/SmolGRPO-135M
Text Generation
• 0.1B • Updated
Text Generation
• 0.1B • Updated