βš–οΈ Fine-Tuning Indonesian Legal QA with GRPO (In Progress)

This repository documents the real-world fine-tuning process of an Indonesian legal case QA dataset using Group Relative Policy Optimization (GRPO) β€” a reinforcement learning technique optimized for language models.

Rather than a "proof of concept," this is a proof of practice:

βœ… That fine-tuning legal QA models is achievable βœ… That free tools like Kaggle and Hugging Face are enough βœ… That you don’t need a paid GPU or local compute to build powerful NLP systems


πŸ”„ Fine-Tuning Workflow

  • βœ… Model: Lightweight, efficient transformer-based model (ideal for legal QA)
  • βœ… Technique: GRPO (Group Relative Policy Optimization)
  • βœ… Compute: Kaggle GPU (30-hour quota / week)
  • βœ… Pipeline: Cloud-to-cloud push/pull of checkpoints via Hugging Face Hub
  • βœ… Epochs: Trained across 3 full epochs on the complete dataset

Checkpoints are updated incrementally and stored in Hugging Face for easy access and reproducibility.


πŸ› οΈ Current Status

  • 🟑 Training in progress
  • πŸ“ˆ GRPO working efficiently within Kaggle limits
  • ⏱️ Approx. 72 hours projected for full run (within Kaggle’s free GPU quota)
  • βœ… All training, saving, and transfer operations are done via cloud tools β€” no local GPU needed

🎯 Why This Matters

This repository is a proof that:

  • πŸ§‘β€πŸŽ“ Students and researchers can fine-tune advanced models using only free cloud tools
  • πŸ§ͺ GRPO can be applied in resource-constrained environments
  • πŸ“š Legal NLP is now more accessible than ever
  • 🌍 Anyone, from anywhere, can participate in open AI development β€” no expensive infrastructure required

πŸ“¦ What’s Inside

  • 🧠 Indonesian legal QA dataset fine-tuned with GRPO
  • πŸ“Š Checkpoints from each training epoch
  • πŸ” Compatible with transformers, peft, and RL libraries
  • πŸ“ Simple logs and setup for reproducibility

⚠️ Disclaimer

This repository is for learning, research, and development only. The dataset is derived from public consultations and may not represent finalized legal interpretations.


πŸ™ Acknowledgments


Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support

Model tree for Azzindani/Qwen3_0.6B_Legal_Checkpoint

Finetuned
Qwen/Qwen3-0.6B
Finetuned
(427)
this model

Dataset used to train Azzindani/Qwen3_0.6B_Legal_Checkpoint