DiiaLLM-270m-it-sft-full-21-10
Base model: google/gemma-3-270m-it
Task: Supervised fine-tuning (SFT) for a Ukrainian e-government assistant (Diia)
Primary language: Ukrainian (uk)
Release date: 2025-10-21
Owner: dovcharenko (private use)
A compact (270M) instruction-tuned chat model specialized for Diia product support: concise guidance, portal/app navigation, and polite clarifications in Ukrainian. Designed to pair well with retrieval (RAG) and tools for time-sensitive facts (fees, phone numbers, region lists, eligibility rules).
TL;DR
- Why this model? Lightweight helper for Ukrainian public-service flows where latency/cost matter.
- Good at: Clear, short answers; step-by-step “how to” instructions; polite confirmations; follow-ups.
- Not for: Legal authority or live policy updates — use RAG/tools for that.
Evaluation (LLM-as-Judge, Ukrainian)
Evaluation on 100 internal prompts (diia_eval.jsonl) using a Ukrainian LLM-as-Judge with two criteria (1–10):
- Accuracy & Completeness
- Clarity & Fluency (answers not in Ukrainian are penalized)
| Model | N (accuracy) | Accuracy mean | Accuracy median | N (clarity) | Clarity mean | Clarity median |
|---|---|---|---|---|---|---|
| dovcharenko/DiiaLLM-270m-it-sft-full-21-10 | 100 | 5.52 | 6 | 100 | 7.39 | 8 |
google/gemma-3-4b-it |
98 | 4.91 | 5 | 98 | 7.24 | 8 |
Method. For each (question, reference) pair, the model generated an answer; a judge model produced strict-JSON scores.
Caveat. LLM-as-Judge correlates with human ratings but does not replace human review—especially for sensitive/edge cases.
Intended Use
- Use cases: FAQ-style guidance for Diia services (e.g., FOP/sole proprietor, documents, social assistance, vehicle services), step-by-step navigation, clarifying questions, courteous closings.
- Out of scope: Binding legal advice; case-specific decisions; storing personal data without proper safeguards.
Training Summary
- Base:
google/gemma-3-270m-it(chat) - Objective: SFT on Ukrainian dialogs from Diia support scenarios
- Formatting: Gemma-3 chat template via
tokenizer.apply_chat_template - Hardware: Colab GPU (L4/T4; bf16 preferred if available)
- Typical recipe:
- Context length: 2048
- Small per-device batch (use gradient accumulation)
- Epochs: 1–2 (monitor overfit)
- LR: 2e-4 (cosine), warmup 3%
- Gradient checkpointing: on
- Optional: LoRA (
q_proj,k_proj,v_proj,o_proj,gate_proj,up_proj,down_proj) for lower VRAM and safer iteration
Data note: Prefer single-turn instruction→answer pairs or curated multi-turn where context is essential. Avoid excessive chit-chat unless it reflects desired behavior.
Quick Start (Inference)
from transformers import AutoTokenizer, AutoModelForCausalLM
import torch
MODEL_ID = "dovcharenko/DiiaLLM-270m-it-sft-full-21-10" # private repo
tok = AutoTokenizer.from_pretrained(MODEL_ID)
if tok.pad_token_id is None:
tok.pad_token_id = tok.eos_token_id
model = AutoModelForCausalLM.from_pretrained(
MODEL_ID,
torch_dtype=torch.bfloat16 if torch.cuda.get_device_capability(0)[0] >= 8 else torch.float16,
device_map="auto"
).eval()
chat = [{"role": "user", "content": "Як подати нульову декларацію ФОП?"}]
prompt = tok.apply_chat_template(chat, tokenize=False, add_generation_prompt=True)
inputs = tok(prompt, return_tensors="pt").to(model.device)
with torch.inference_mode():
out = model.generate(
**inputs,
max_new_tokens=192,
do_sample=False, # deterministic for prod
eos_token_id=tok.eos_token_id,
pad_token_id=tok.pad_token_id,
use_cache=True,
)
answer = tok.decode(out[0, inputs["input_ids"].shape[1]:], skip_special_tokens=True).strip()
print(answer)
- Downloads last month
- -
Model tree for dovcharenko/DiiaLLM-270m-it-sft-full-21-10
Evaluation results
- Accuracy & Completeness (LLM-as-Judge) on diia_eval (internal)test set self-reported5.520
- Clarity & Fluency (LLM-as-Judge) on diia_eval (internal)test set self-reported7.390