๐Ÿงช Indica-1.7B: An Experimental Research Model ๐Ÿ‡ฎ๐Ÿ‡ณ

NOTICE: This is an experimental model released strictly for research and development purposes.
It serves as a proof-of-concept for a four-stage post-training pipeline applied to Small Language Models (SLMs).

Indica-1.7B is a lightweight language model developed by Prashant to explore the limits of persona injection, cultural alignment, and reasoning behavior in ultra-small parameter architectures (1.7B).

Built on Qwen3-1.7B, the model was subjected to a rigorous post-training regime including Supervised Fine-Tuning (SFT), GRPO-based reasoning alignment, and Direct Preference Optimization (DPO).


๐Ÿ”ฌ Research Objective

This project investigates whether a 1.7B-parameter model can balance three traditionally competing objectives:

  1. Domain Expertise
    Knowledge of Indian Law (IPC/BNS) and Agriculture.

  2. Linguistic Persona
    Natural Hinglish/Hindi code-switching with colloquial Indian tone.

  3. Logic & Reasoning
    Utilization of an explicit internal reasoning trace via native <think> tags.


๐Ÿ› ๏ธ Post-Training Pipeline

The model underwent a specialized four-stage alignment strategy:

  • Stage 1: SFT (Knowledge)
    Supervised fine-tuning on Indian Law and Agriculture datasets.

  • Stage 2: GRPO (Reasoning)
    Reinforcement learning to reward structured reasoning using <think> tags.

  • Stage 3: DPO (Persona Alignment)
    Preference optimization to shape a friendly, culturally grounded โ€œIndian AI Assistantโ€ identity.

  • Stage 4: Optimization & Export
    Exported using Unsloth for efficient GGUF-based local inference.


๐Ÿ“‰ Known Limitations & Experimental Findings

(The โ€œAlignment Taxโ€)

As an experimental 1.7B-parameter model, Indica exhibits several important alignment-related trade-offs:

  • Factual Regression
    Due to limited parameter capacity, the final DPO stage introduces loss in precision for mathematical reasoning and exact legal section numbering.

  • Persona Drift
    The model may prioritize its creative or conversational persona over strict technical accuracy, occasionally identifying itself as entities such as an โ€œAI Zindagi Manager.โ€

  • Logic Bypassing
    In some cases, the model may skip the internal <think> reasoning trace and respond directly, leading to incomplete or incorrect answers.

  • Repetition Loops
    Occasional repetition or gibberish outputs may occur, particularly in long Hinglish conversations.

These behaviors are considered expected outcomes when aggressively aligning small models beyond their parameter limits.


๐Ÿ“ฆ Deployment (For Testing & Research)

This model is best suited for:

  • Studying Hinglish conversational behavior
  • Exploring persona-alignment trade-offs
  • Serving as a base for further fine-tuning experiments

Local Inference with Ollama

ollama run hf.co/prash616/Indica-1.7B-GGUF

๐Ÿค Credits & Acknowledgements

  • Developer: Prashant (prash616)
  • Base Model: Alibaba Qwen Team
  • Training Framework & Optimization: Unsloth AI

Disclaimer

This model is released strictly for educational and research purposes.
It should not be used for real-world legal, agricultural, or mathematical decision-making.

Indica-1.7B is an experimental exploration of how far cultural alignment and persona shaping can be pushed in small-scale language modelsโ€”highlighting both their promise and their structural limits.

Downloads last month
33
Safetensors
Model size
2B params
Tensor type
BF16
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support

Model tree for prash616/Indica-1.7B

Finetuned
Qwen/Qwen3-1.7B
Finetuned
unsloth/Qwen3-1.7B
Finetuned
(139)
this model
Quantizations
2 models