Hugging Face
Models
Datasets
Spaces
Buckets
new
Docs
Enterprise
Pricing
Log In
Sign Up
jaygala24
's Collections
RL post-training
RL post-training
updated
11 days ago
Upvote
-
jaygala24/Qwen3-4B-GRPO-KL-math-reasoning
Text Generation
•
4B
•
Updated
5 days ago
•
1.01k
jaygala24/Qwen3-4B-GRPO-math-reasoning
Text Generation
•
4B
•
Updated
5 days ago
•
851
jaygala24/Qwen3-4B-ReMax-math-reasoning
Text Generation
•
4B
•
Updated
5 days ago
•
798
jaygala24/Qwen3-1.7B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
5 days ago
•
801
jaygala24/Qwen3-1.7B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
5 days ago
•
810
jaygala24/Qwen3-1.7B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
5 days ago
•
855
jaygala24/Qwen2.5-3B-GRPO-KL-math-reasoning
Text Generation
•
3B
•
Updated
5 days ago
•
765
jaygala24/Qwen2.5-3B-GRPO-math-reasoning
Text Generation
•
3B
•
Updated
5 days ago
•
790
jaygala24/Qwen2.5-3B-ReMax-math-reasoning
Text Generation
•
3B
•
Updated
5 days ago
•
438
jaygala24/Qwen2.5-1.5B-GRPO-KL-math-reasoning
Text Generation
•
2B
•
Updated
5 days ago
•
500
jaygala24/Qwen2.5-1.5B-GRPO-math-reasoning
Text Generation
•
2B
•
Updated
5 days ago
•
551
jaygala24/Qwen2.5-1.5B-ReMax-math-reasoning
Text Generation
•
2B
•
Updated
5 days ago
•
424
jaygala24/Qwen2.5-0.5B-GRPO-KL-math-reasoning
Text Generation
•
0.5B
•
Updated
5 days ago
•
521
jaygala24/Qwen2.5-0.5B-GRPO-math-reasoning
Text Generation
•
0.5B
•
Updated
5 days ago
•
550
jaygala24/Qwen2.5-0.5B-ReMax-math-reasoning
Text Generation
•
0.5B
•
Updated
5 days ago
•
433
Upvote
-
Share collection
View history
Collection guide
Browse collections