--- license: apache-2.0 pipeline_tag: text-generation library_name: transformers ---

# Param-1 **BharatGen** introduces **Param-1**, a bilingual language model pretrained from scratch on English and Hindi. With 2.9 billion parameters, it serves as a powerful foundational model for text completion. **Param-1** outperforms leading models like **LLaMA-3.2B**, **Gemma-2B**, **Granite-2B**, and **Granite-3B** on various standard benchmarks. This early release is equipped with inference support via **NVIDIA NeMo**. --- ## 🚀 Model Inference ```python from transformers import AutoTokenizer, AutoModelForCausalLM import torch # Load tokenizer and model model_name = "bharatgenai/Param-1" tokenizer = AutoTokenizer.from_pretrained(model_name, trust_remote_code=False) model = AutoModelForCausalLM.from_pretrained( model_name, trust_remote_code=True, torch_dtype=torch.bfloat16 if torch.cuda.is_available() else torch.bfloat32, device_map="auto" ) prompt = "Your prompt here." inputs = tokenizer(prompt, return_tensors="pt").to(model.device) # --- Generate output --- with torch.no_grad(): output = model.generate( **inputs, max_new_tokens=300, do_sample=True, top_k=50, top_p=0.95, temperature=0.6, eos_token_id=tokenizer.eos_token_id, use_cache=False ) generated_text = tokenizer.decode(output[0], skip_special_tokens=True) print("Generated Text:\n", generated_text) ``` --- ## 📊 Benchmarks | Task | **Param-1 (PT)** | |------|----------------------------| | ARC Challenge | 53.6 (few) | | ARC Easy | 74.2 (few) | | HellaSwag | 73.8 (few) | | HellaSwag Hi | 43.1 (few) | | MMLU En | 46.2 (few) | | MMLU Hi | 34.6 (few) | | TriviaQA | 42.8 | | TruthfulQA - Gen (BLEU) | 37.3 | | TruthfulQA - MC1 Acc | 28.4 | | TruthfulQA - MC2 Acc | 42.9 | | PIQA | 79.2 | | SuperGLUE - WiC | 50.6 | | SuperGLUE - WSC | 52.9 | | SuperGLUE - boolq | 72.6 | | SuperGLUE - rte | 66.8 | > **Notes:** > - **PT**: Pre-Trained > - **en-hi**: English-Hindi > - Pre-trained on **5 Trillion tokens** --- ## 🧠 Model Architecture - Hidden size: 2048 - Intermediate size: 7168 - Number of attention heads: 16 - Number of hidden layers: 32 - Number of key-value heads: 8 - Maximum position embeddings: 2048 - Activation function: **SiLU** - Positional embeddings: **Rotary (RoPE)** with `rope_theta=10000.0` - Attention: **Grouped-query attention** - Precision: **bf16-mixed** --- ## 🏗️ Training Details - **Training Infrastructure**: Yotta’s Shakti Cloud - **Hardware**: NVIDIA H100 – 512 GPUs - **Framework**: NVIDIA NeMo ---