🧪 Indica-1.7B: An Experimental Research Model 🇮🇳

NOTICE: This is an experimental model released strictly for research and development purposes.
It serves as a proof-of-concept for a four-stage post-training pipeline applied to Small Language Models (SLMs).

Indica-1.7B is a lightweight language model developed by Prashant to explore the limits of persona injection, cultural alignment, and reasoning behavior in ultra-small parameter architectures (1.7B).

Built on Qwen3-1.7B, the model was subjected to a rigorous post-training regime including Supervised Fine-Tuning (SFT), GRPO-based reasoning alignment, and Direct Preference Optimization (DPO).

🔬 Research Objective

This project investigates whether a 1.7B-parameter model can balance three traditionally competing objectives:

Domain Expertise
Knowledge of Indian Law (IPC/BNS) and Agriculture.
Linguistic Persona
Natural Hinglish/Hindi code-switching with colloquial Indian tone.
Logic & Reasoning
Utilization of an explicit internal reasoning trace via native <think> tags.

🛠️ Post-Training Pipeline

The model underwent a specialized four-stage alignment strategy:

Stage 1: SFT (Knowledge)
Supervised fine-tuning on Indian Law and Agriculture datasets.
Stage 2: GRPO (Reasoning)
Reinforcement learning to reward structured reasoning using <think> tags.
Stage 3: DPO (Persona Alignment)
Preference optimization to shape a friendly, culturally grounded “Indian AI Assistant” identity.
Stage 4: Optimization & Export
Exported using Unsloth for efficient GGUF-based local inference.

📉 Known Limitations & Experimental Findings

(The “Alignment Tax”)

As an experimental 1.7B-parameter model, Indica exhibits several important alignment-related trade-offs:

Factual Regression
Due to limited parameter capacity, the final DPO stage introduces loss in precision for mathematical reasoning and exact legal section numbering.
Persona Drift
The model may prioritize its creative or conversational persona over strict technical accuracy, occasionally identifying itself as entities such as an “AI Zindagi Manager.”
Logic Bypassing
In some cases, the model may skip the internal <think> reasoning trace and respond directly, leading to incomplete or incorrect answers.
Repetition Loops
Occasional repetition or gibberish outputs may occur, particularly in long Hinglish conversations.

These behaviors are considered expected outcomes when aggressively aligning small models beyond their parameter limits.

📦 Deployment (For Testing & Research)

This model is best suited for:

Studying Hinglish conversational behavior
Exploring persona-alignment trade-offs
Serving as a base for further fine-tuning experiments

Local Inference with Ollama

ollama run hf.co/prash616/Indica-1.7B-GGUF

🤝 Credits & Acknowledgements

Developer: Prashant (prash616)
Base Model: Alibaba Qwen Team
Training Framework & Optimization: Unsloth AI

Disclaimer

This model is released strictly for educational and research purposes.
It should not be used for real-world legal, agricultural, or mathematical decision-making.

Indica-1.7B is an experimental exploration of how far cultural alignment and persona shaping can be pushed in small-scale language models—highlighting both their promise and their structural limits.

Downloads last month: 33

Safetensors

Model size

2B params

Tensor type

BF16

Model tree for prash616/Indica-1.7B

Base model

Qwen/Qwen3-1.7B-Base

Finetuned

Qwen/Qwen3-1.7B

Finetuned

unsloth/Qwen3-1.7B

Finetuned

(139)

this model

Quantizations

2 models