๐งช Indica-1.7B: An Experimental Research Model ๐ฎ๐ณ
NOTICE: This is an experimental model released strictly for research and development purposes.
It serves as a proof-of-concept for a four-stage post-training pipeline applied to Small Language Models (SLMs).
Indica-1.7B is a lightweight language model developed by Prashant to explore the limits of persona injection, cultural alignment, and reasoning behavior in ultra-small parameter architectures (1.7B).
Built on Qwen3-1.7B, the model was subjected to a rigorous post-training regime including Supervised Fine-Tuning (SFT), GRPO-based reasoning alignment, and Direct Preference Optimization (DPO).
๐ฌ Research Objective
This project investigates whether a 1.7B-parameter model can balance three traditionally competing objectives:
Domain Expertise
Knowledge of Indian Law (IPC/BNS) and Agriculture.Linguistic Persona
Natural Hinglish/Hindi code-switching with colloquial Indian tone.Logic & Reasoning
Utilization of an explicit internal reasoning trace via native<think>tags.
๐ ๏ธ Post-Training Pipeline
The model underwent a specialized four-stage alignment strategy:
Stage 1: SFT (Knowledge)
Supervised fine-tuning on Indian Law and Agriculture datasets.Stage 2: GRPO (Reasoning)
Reinforcement learning to reward structured reasoning using<think>tags.Stage 3: DPO (Persona Alignment)
Preference optimization to shape a friendly, culturally grounded โIndian AI Assistantโ identity.Stage 4: Optimization & Export
Exported using Unsloth for efficient GGUF-based local inference.
๐ Known Limitations & Experimental Findings
(The โAlignment Taxโ)
As an experimental 1.7B-parameter model, Indica exhibits several important alignment-related trade-offs:
Factual Regression
Due to limited parameter capacity, the final DPO stage introduces loss in precision for mathematical reasoning and exact legal section numbering.Persona Drift
The model may prioritize its creative or conversational persona over strict technical accuracy, occasionally identifying itself as entities such as an โAI Zindagi Manager.โLogic Bypassing
In some cases, the model may skip the internal<think>reasoning trace and respond directly, leading to incomplete or incorrect answers.Repetition Loops
Occasional repetition or gibberish outputs may occur, particularly in long Hinglish conversations.
These behaviors are considered expected outcomes when aggressively aligning small models beyond their parameter limits.
๐ฆ Deployment (For Testing & Research)
This model is best suited for:
- Studying Hinglish conversational behavior
- Exploring persona-alignment trade-offs
- Serving as a base for further fine-tuning experiments
Local Inference with Ollama
ollama run hf.co/prash616/Indica-1.7B-GGUF
๐ค Credits & Acknowledgements
- Developer: Prashant (
prash616) - Base Model: Alibaba Qwen Team
- Training Framework & Optimization: Unsloth AI
Disclaimer
This model is released strictly for educational and research purposes.
It should not be used for real-world legal, agricultural, or mathematical decision-making.
Indica-1.7B is an experimental exploration of how far cultural alignment and persona shaping can be pushed in small-scale language modelsโhighlighting both their promise and their structural limits.
- Downloads last month
- 33