Configuration Parsing Warning:Invalid JSON for config file config.json

Nepal Supreme Court Judgement Trained From Scratch using Architecture of Gemma 3 270M

This model is a pretrained on Nepal Supreme Court judgments dataset from Scratch.

Model Details

-Model: build from scratch

Tokenizer: google/gemma-3-270m-it
Architecture: Gemma 3 270M (270M parameters)
Training Data: Nepal Supreme Court judgments (~1400+ documents, 70k+ rows)
Context Length: 2048 tokens
Vocabulary Size: 256,000 tokens

Training Details

Framework: PyTorch (from scratch implementation)
Optimizer: AdamW (lr=1e-4, weight_decay=0.1)
Scheduler: Linear warmup + Cosine decay
Precision: bfloat16/float16 mixed precision
Hardware: Tesla T4 GPU (Google Colab)

Model Architecture

Model_CONFIG = {
    "vocab_size": tokenizer.vocab_size, # Update vocab size to match tokenizer (256000)
    "context_length": 2048, # Reduced context length for T4 GPU memory constraints
    "emb_dim": 640,
    "n_heads": 4,
    "n_layers": 18,
    "hidden_dim": 2048,
    "head_dim": 256,
    "qk_norm": True,
    "n_kv_groups": 1,
    "rope_local_base": 10_000.0,
    "rope_base": 1_000_000.0,
    "sliding_window": 512,
    "layer_types": [
        "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention",
        "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention",
        "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "sliding_attention", "full_attention"
    ],
    "dtype": torch.bfloat16,
    "query_pre_attn_scalar": 256,
}

Usage

from transformers import AutoTokenizer
import torch

# Load tokenizer
tokenizer = AutoTokenizer.from_pretrained("google/gemma-3-270m-it")

# Load model (you need to implement the architecture or use the provided code)
# See the original implementation for model architecture

# Generate text
prompt = "सर्वोच्च अदालतको निर्णय अनुसार"
inputs = tokenizer(prompt, return_tensors="pt")
# ... generation code ...

Downloads last month: 3

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

chhatramani
/

court_judgement_pretrain_Scratch_test

Nepal Supreme Court Judgement Trained From Scratch using Architecture of Gemma 3 270M

Model Details

Training Details

Model Architecture

Usage

Dataset used to train chhatramani/court_judgement_pretrain_Scratch_test