Edit Models filters

Apps

Docker Model Runner

Inference Providers

OVHcloud AI Endpoints

HF Inference API

Misc

Inference Endpoints

text-generation-inference

Eval Results (legacy)

text-embeddings-inference

4-bit precision

8-bit precision

Mixture of Experts

Carbon Emissions

Models

177

Full-text search

Active filters: GRPO

DannieAI/SmolGRPO-135M

Text Generation • 0.1B • Updated Dec 3, 2025 • 2

zijie0304/SmolGRPO-135M

Text Generation • 0.1B • Updated Dec 16, 2025

chanhyeok/SmolGRPO-135M

Text Generation • 0.1B • Updated Dec 30, 2025 • 1

Eubiota/eubiota-planner-8b

Text Generation • Updated 6 days ago • 295 • 1

7beshoyarnest/fine_tuned_SmolGRPO-135M_using_GRPO

Text Generation • 0.1B • Updated Jan 22 • 7

halxj/Devjalx-4b

Text Generation • 4B • Updated Jan 24 • 1

AnnLo/SmolGRPO-135M

Text Generation • 0.1B • Updated Jan 26

Cacciatore2023/SmolGRPO-135M

Text Generation • 0.1B • Updated Jan 26

Cacciatore2023/SmolGRPO-135M-v2

Text Generation • 0.1B • Updated Jan 26 • 3

kunjcr2/stablelm-1.6b-finetuned-aligned

Text Generation • Updated Jan 28

mradermacher/SmolGRPO-135M-v2-GGUF

0.1B • Updated Jan 28 • 28

chrisluo5311/Qwen2.5-7B-Instruct-SFT-GRPO-Merged-ROI

8B • Updated Jan 29 • 2

Supreeth/searchlm-qwen2.5-3b-rlhf

Text Generation • 3B • Updated Jan 31 • 2

mradermacher/eubiota-planner-8b-GGUF

Reinforcement Learning • 8B • Updated 28 days ago • 483

mradermacher/eubiota-planner-8b-i1-GGUF

Reinforcement Learning • 8B • Updated 28 days ago • 757

Perditio/SmolGRPO-135M

Text Generation • 0.1B • Updated 26 days ago • 10

OpenDataArena/ODA-Fin-RL-8B

Reinforcement Learning • Updated 1 day ago

syanwang/SmolGRPO-135M

Text Generation • 0.1B • Updated 16 days ago • 9

npallewela/Qwen-0.5-RL-tune

Text Generation • 0.5B • Updated 15 days ago • 11

npallewela/SmolGRPO-135M

Text Generation • 0.1B • Updated 15 days ago • 7

npallewela/Qwen-0.5B-moral_social_emh

Text Generation • 0.5B • Updated 14 days ago • 23

npallewela/Qwen-1.5B-moral_social_emh

Text Generation • 2B • Updated 14 days ago • 14

npallewela/Qwen-1.5B-moral_social_ed_1

Text Generation • 2B • Updated 13 days ago • 9

npallewela/Qwen-1.5B-moral_social_all

Text Generation • 2B • Updated 11 days ago • 14

npallewela/Qwen-1.5B-moral_social_all_1

Text Generation • 2B • Updated 8 days ago • 14

tyodd/SafeGuard-VL-RL

Image-Text-to-Text • 849k • Updated 6 days ago • 33

npallewela/Qwen-1.5B-moral_social_all_2

Text Generation • 2B • Updated 2 days ago • 20