βοΈ Fine-Tuning Indonesian Legal QA with GRPO (In Progress)
This repository documents the real-world fine-tuning process of an Indonesian legal case QA dataset using Group Relative Policy Optimization (GRPO) β a reinforcement learning technique optimized for language models.
Rather than a "proof of concept," this is a proof of practice:
β That fine-tuning legal QA models is achievable β That free tools like Kaggle and Hugging Face are enough β That you donβt need a paid GPU or local compute to build powerful NLP systems
π Fine-Tuning Workflow
- β Model: Lightweight, efficient transformer-based model (ideal for legal QA)
- β Technique: GRPO (Group Relative Policy Optimization)
- β Compute: Kaggle GPU (30-hour quota / week)
- β Pipeline: Cloud-to-cloud push/pull of checkpoints via Hugging Face Hub
- β Epochs: Trained across 3 full epochs on the complete dataset
Checkpoints are updated incrementally and stored in Hugging Face for easy access and reproducibility.
π οΈ Current Status
- π‘ Training in progress
- π GRPO working efficiently within Kaggle limits
- β±οΈ Approx. 72 hours projected for full run (within Kaggleβs free GPU quota)
- β All training, saving, and transfer operations are done via cloud tools β no local GPU needed
π― Why This Matters
This repository is a proof that:
- π§βπ Students and researchers can fine-tune advanced models using only free cloud tools
- π§ͺ GRPO can be applied in resource-constrained environments
- π Legal NLP is now more accessible than ever
- π Anyone, from anywhere, can participate in open AI development β no expensive infrastructure required
π¦ Whatβs Inside
- π§ Indonesian legal QA dataset fine-tuned with GRPO
- π Checkpoints from each training epoch
- π Compatible with
transformers,peft, and RL libraries - π Simple logs and setup for reproducibility
β οΈ Disclaimer
This repository is for learning, research, and development only. The dataset is derived from public consultations and may not represent finalized legal interpretations.
π Acknowledgments
- π€ Powered by Hugging Face
- π§ Trained with Kaggle Notebooks GPU
- π‘ Inspired by open-access ML for public good