Models

22,167

Full-text search

Active filters: grpo

williyam/agentic-rag-aerospace-grpo

Text Generation • Updated 2 days ago • 88 • 2

munish0838/consultenv-qwen3b-grpo-lora

Text Generation • Updated 2 days ago • 22 • 2

UKPLab/SciRM-7B

Text Generation • 8B • Updated 15 days ago • 236 • 2

UKPLab/SciRM-Ref-7B

Text Generation • 8B • Updated 15 days ago • 219 • 1

mradermacher/Poe-8B-GLM5-Opus4.6-Sonnet4.5-Kimi-Grok-Gemini-3-pro-preview-HERETIC-GGUF

8B • Updated Mar 15 • 2.36k • 5

nvidia/EGM-8B

Image-Text-to-Text • 9B • Updated 18 days ago • 407 • 6

infraxa/Qwen3.5-Trading-Agent

Text Generation • 35B • Updated Mar 23 • 102 • 5

dennisonb/qwen25-tax-3b

Reinforcement Learning • 3B • Updated Mar 27 • 13 • 1

Semaj90/gemma4-e4b-legal-grpo

Text Generation • Updated 23 days ago • 62 • 1

hongli-zhan/MINT-empathy-Qwen3-4B

Text Generation • 4B • Updated 12 days ago • 1.06k • 3

mradermacher/SciRM-Ref-7B-GGUF

Text Generation • 8B • Updated 14 days ago • 633 • 1

mradermacher/SciRM-Ref-7B-i1-GGUF

Text Generation • 8B • Updated 13 days ago • 6.59k • 1

mradermacher/SciRM-7B-GGUF

Text Generation • 8B • Updated 13 days ago • 734 • 1

mradermacher/SciRM-7B-i1-GGUF

Text Generation • 8B • Updated 13 days ago • 1.85k • 1

jordanpainter/diallm-qwen-grpo-all

Text Generation • 8B • Updated 10 days ago • 430 • 1

mradermacher/diallm-qwen-grpo-all-GGUF

8B • Updated 9 days ago • 600 • 1

Godwinlyamba/sanity-gradients-opensource-god-environment-tourn-da8e132b7783f8ac-20260413-position-1-088ee08b

Text Generation • Updated 5 days ago • 89 • 1

peterxyz/forecasting-grpo-qwen3-8b-v6-bounded-max250

Updated 5 days ago • 1

uam-rl/qwen35-9b-typst-grpo-lora

Text Generation • Updated 4 days ago • 31 • 1

rroshann/sec-sentiment-sftgrpo-deepseek-14b

Text Generation • 15B • Updated 4 days ago • 153 • 1

SofiTesfay2010/scientific-reasoning-training

Updated 4 days ago • 1

munish0838/cenv-trl-grpo-v1

Text Generation • Updated 4 days ago • 19 • 1

lucifer0077/code-review-agent-grpo

Text Generation • Updated about 16 hours ago • 15 • 1

rishi38/smart_emergency

Updated 2 days ago • 1

DGXAI/gemma-3n-e2b-driftcall-lora

Text Generation • Updated 2 days ago • 57 • 1

creovateHQ/Qwen2.5-3B-Instruct_BrowserForge_Adapter

Text Generation • Updated 2 days ago • 7 • 1

pvs333/supergames-grpo

Text Generation • 2B • Updated 2 days ago • 87 • 1

Chun121/Qwen3-4B-RPG-Roleplay-V2

Text Generation • 4B • Updated Aug 24, 2025 • 15.9k • 52

onuryozcu/llama

Text Generation • 0.1B • Updated Mar 10, 2025 • 15

amiguel/promptTuning

8B • Updated Feb 16, 2025 • 2