Instructions to use purbeshmitra/vanillaGRPO with libraries, inference providers, notebooks, and local apps. Follow these links to get started.
- Libraries
- Transformers
How to use purbeshmitra/vanillaGRPO with Transformers:
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="purbeshmitra/vanillaGRPO")# Load model directly from transformers import AutoModel model = AutoModel.from_pretrained("purbeshmitra/vanillaGRPO", dtype="auto") - Notebooks
- Google Colab
- Kaggle
- Local Apps Settings
- vLLM
How to use purbeshmitra/vanillaGRPO with vLLM:
Install from pip and serve model
# Install vLLM from pip: pip install vllm # Start the vLLM server: vllm serve "purbeshmitra/vanillaGRPO" # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:8000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "purbeshmitra/vanillaGRPO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker
docker model run hf.co/purbeshmitra/vanillaGRPO
- SGLang
How to use purbeshmitra/vanillaGRPO with SGLang:
Install from pip and serve model
# Install SGLang from pip: pip install sglang # Start the SGLang server: python3 -m sglang.launch_server \ --model-path "purbeshmitra/vanillaGRPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "purbeshmitra/vanillaGRPO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }'Use Docker images
docker run --gpus all \ --shm-size 32g \ -p 30000:30000 \ -v ~/.cache/huggingface:/root/.cache/huggingface \ --env "HF_TOKEN=<secret>" \ --ipc=host \ lmsysorg/sglang:latest \ python3 -m sglang.launch_server \ --model-path "purbeshmitra/vanillaGRPO" \ --host 0.0.0.0 \ --port 30000 # Call the server using curl (OpenAI-compatible API): curl -X POST "http://localhost:30000/v1/completions" \ -H "Content-Type: application/json" \ --data '{ "model": "purbeshmitra/vanillaGRPO", "prompt": "Once upon a time,", "max_tokens": 512, "temperature": 0.5 }' - Docker Model Runner
How to use purbeshmitra/vanillaGRPO with Docker Model Runner:
docker model run hf.co/purbeshmitra/vanillaGRPO
# Load model directly
from transformers import AutoModel
model = AutoModel.from_pretrained("purbeshmitra/vanillaGRPO", dtype="auto")MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs
🔗 Paper link: Arxiv preprint
🔗 Github link: Training and evaluation code
🔗 Link to the trained models: Hugging Face collection
- Algorithm: GRPO
- Training data: GSM8K
- Base model: unsloth/qwen2.5-3b-instruct-unsloth-bnb-4bit
The INFTYTHINK architecture, shown below, allows multi-round thinking for extended LLM reasoning beyond its context size.
In this work, we propose a GRPO based training method for such a system that allows to calculate the accuracy reward by rolling out trajectories and applying the reward at the first round of inference outcomes. This is depicted as following:
Results
Our results are shown below:
Usage
from transformers import AutoModelForCausalLM
from peft import PeftModel
base_model = AutoModelForCausalLM.from_pretrained("unsloth/qwen2.5-1.5b-instruct-unsloth-bnb-4bit")
model = PeftModel.from_pretrained(base_model, "purbeshmitra/vanillaGRPO")
SYSTEM_PROMPT = "You are a helpful assistant. When the user asks a question, you first think about the reasoning process in mind and then provide the user with an answer. The reasoning process and the answer are enclosed within <reasoning> </reasoning> and <answer> </answer> tags, respectively. In your answer, you also enclose your final answer in the box: \\boxed{}. Therefore, you respond in the following strict format:
<reasoning> reasoning process here </reasoning> <answer> answer here </answer>."
Citation
If you find our work useful, consider citing it as:
@article{mitra2025motif,
title={MOTIF: Modular Thinking via Reinforcement Fine-tuning in LLMs},
author={Mitra, Purbesh and Ulukus, Sennur},
journal={arXiv preprint arXiv:2507.02851},
year={2025}
}
# Use a pipeline as a high-level helper from transformers import pipeline pipe = pipeline("text-generation", model="purbeshmitra/vanillaGRPO")