File size: 4,836 Bytes
6d616c2
 
79cb004
6635731
 
 
 
acc0e2e
 
6635731
 
 
 
 
acc0e2e
6635731
 
 
 
 
 
 
 
 
 
 
 
 
 
 
41725d9
6d616c2
 
6635731
 
ebe7ba0
6635731
 
02c46ee
ebe7ba0
6635731
 
 
37da6e8
6d616c2
6635731
6d616c2
6635731
 
4645713
6635731
 
6d616c2
650608e
 
 
 
6635731
 
 
 
6d616c2
5398dec
6d616c2
6635731
 
 
6d616c2
6635731
 
 
6d616c2
6635731
 
313d635
6635731
 
6d616c2
6635731
 
 
 
313d635
6635731
6d616c2
6635731
 
6d616c2
6635731
5398dec
 
 
 
 
 
 
6635731
 
6d616c2
6635731
 
5398dec
6635731
6d616c2
638a901
 
 
 
 
4645713
638a901
 
 
6635731
 
6d616c2
6635731
 
 
6d616c2
6635731
 
 
 
6d616c2
6635731
 
6d616c2
 
6635731
 
650608e
6d616c2
 
650608e
6635731
6d616c2
6635731
 
 
6d616c2
6635731
ba964da
2274600
ba964da
 
 
638a901
ba964da
 
 
 
 
 
 
 
 
79cb004
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
---
library_name: transformers
license: gpl-3.0
language:
- as
- bn
- brx
- doi
- gom
- gu
- en
- hi
- kn
- ks
- mai
- ml
- mni
- mr
- ne
- or
- pa
- sa
- sat
- sd
- ta
- te
- ur
base_model:
- google/gemma-3-4b-it
base_model_relation: finetune
pipeline_tag: translation
---

# Sarvam-Translate
<p align="center">
  <a href="https://dashboard.sarvam.ai/translate"
     target="_blank" rel="noopener noreferrer">
    <img
      src="https://img.shields.io/badge/🚀 Try on Sarvam&nbsp;Playground-1488CC?style=for-the-badge&logo=rocket"
      alt="Try on Sarvam Playground"
    />
  </a>
</p>
Sarvam-Translate is an advanced translation model built by Sarvam AI in partnership with AI4Bharat, specifically designed for comprehensive, document-level translation across the 22 official Indian languages, built on Gemma3-4B-IT. It addresses modern translation needs by moving beyond isolated sentences to handle long-context inputs, diverse content types, and various formats. Sarvam-Translate aims to provide high-quality, contextually aware translations for Indian languages, which have traditionally lagged behind high-resource languages in LLM performance.

Learn more about Sarvam-Translate in our detailed [blog post](https://www.sarvam.ai/blogs/sarvam-translate).

## Key Features
- **Comprehensive Indian Language Support**: Focus on the 22 official Indian languages, ensuring nuanced and accurate translations.
- **Advanced Document-Level Translation**: Translates entire documents, web pages, speeches, textbooks, and scientific articles, not just isolated sentences. Maximum context length: 8k tokens
- **Versatile Format Handling**: Processes a wide array of input formats, including markdown, digitized content (handling OCR errors), documents with embedded math and chemistry equations, and code files (translating only comments).
- **Context-Aware & Inclusive**: Engineered to respect different contexts, formats, styles (formal/informal), and ensure inclusivity (e.g., appropriate gender attribution).

## Supported languages list

`Assamese`, `Bengali`, `Bodo`, `Dogri`, `Gujarati`, `English`, `Hindi`, `Kannada`, `Kashmiri`, `Konkani`, `Maithili`, `Malayalam`, `Manipuri`, `Marathi`, `Nepali`, `Odia`, `Punjabi`, `Sanskrit`, `Santali`, `Sindhi`, `Tamil`, `Telugu`, `Urdu`

## Quickstart
The following code snippet demonstrates how to use Sarvam-Translate using Transformers.
```python
from transformers import AutoModelForCausalLM, AutoTokenizer

model_name = "sarvamai/sarvam-translate"

# Load tokenizer and model
tokenizer = AutoTokenizer.from_pretrained(model_name)
model = AutoModelForCausalLM.from_pretrained(model_name).to('cuda:0')

# Translation task
tgt_lang = "Hindi"
input_txt = "Be the change you wish to see in the world."

# Chat-style message prompt
messages = [
    {"role": "system", "content": f"Translate the text below to {tgt_lang}."},
    {"role": "user", "content": input_txt}
]

# Apply chat template to structure the conversation
text = tokenizer.apply_chat_template(
    messages,
    tokenize=False,
    add_generation_prompt=True
)

# Tokenize and move input to model device
model_inputs = tokenizer([text], return_tensors="pt").to(model.device)

# Generate the output
generated_ids = model.generate(
    **model_inputs,
    max_new_tokens=1024,
    do_sample=True,
    temperature=0.01,
    num_return_sequences=1
)
output_ids = generated_ids[0][len(model_inputs.input_ids[0]):].tolist()
output_text = tokenizer.decode(output_ids, skip_special_tokens=True)

print("Input:", input_txt)
print("Translation:", output_text)

```

## vLLM Deployment


### Server:
```bash
vllm serve sarvamai/sarvam-translate --port 8000 --dtype bfloat16 --max-model-len 8192
```

### Client:
```python
from openai import OpenAI

# Modify OpenAI's API key and API base to use vLLM's API server.
openai_api_key = "EMPTY"
openai_api_base = "http://localhost:8000/v1"

client = OpenAI(
    api_key=openai_api_key,
    base_url=openai_api_base,
)

models = client.models.list()
model = models.data[0].id


tgt_lang = 'Hindi'
input_txt = 'Be the change you wish to see in the world.'
messages = [{"role": "system", "content": f"Translate the text below to {tgt_lang}."}, {"role": "user", "content": input_txt}]


response = client.chat.completions.create(model=model, messages=messages, temperature=0.01)
output_text = response.choices[0].message.content

print("Input:", input_txt)
print("Translation:", output_text)
```

## With Sarvam APIs

Refer our [python client documentation](https://pypi.org/project/sarvamai/).

Sample code:

```python
from sarvamai import SarvamAI
client = SarvamAI()
response = client.text.translate(
    input="Be the change you wish to see in the world.",
    source_language_code="en-IN",
    target_language_code="hi-IN",
    speaker_gender="Male",
    model="sarvam-translate:v1",
)
```