EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training
Paper • 2603.02041 • Published
Please note that this is a base text completion model that has not been instruction-tuned. It is intended for fine-tuning on downstream tasks rather than direct use for chat or instruction-following.
The original swiss-ai/Apertus-8B-2509 underwent continuous pre-training starting on approximately 35B tokens. Continued pre-training was performed for a single epoch on:
| Model (# parameters ↓) | belebele-et | exam-et | grammar-et | inflection-et | trivia-et | winogrande-et | xcopa-et | GlobalPIQA-et |
|---|---|---|---|---|---|---|---|---|
| utter-project/EuroLLM-9B | 0.699 | 0.618 | 0.663 | 0.44 | 0.371 | 0.692 | 0.712 | 0.69 |
| mistralai/Ministral-3-8B-Base-2512 | 0.263 | 0.528 | 0.641 | 0.585 | 0.316 | 0.623 | 0.56 | 0.6 |
| swiss-ai/Apertus-8B-2509 | 0.768 | 0.607 | 0.789 | 0.478 | 0.329 | 0.711 | 0.678 | 0.73 |
| meta-llama/Llama-3.1-8B | 0.67 | 0.447 | 0.658 | 0.587 | 0.3 | 0.596 | 0.532 | 0.53 |
| tartuNLP/Apertus-EstLLM-8B-1125 | 0.788 | 0.636 | 0.834 | 0.523 | 0.389 | 0.752 | 0.73 | 0.79 |
| tartuNLP/Llama-3.1-EstLLM-8B-0525 | 0.772 | 0.57 | 0.875 | 0.619 | 0.449 | 0.74 | 0.752 | 0.78 |
| Llammas-base | 0.387 | 0.462 | 0.538 | 0.269 | 0.336 | 0.697 | 0.686 | 0.76 |
| BSC-LT/salamandra-7b | 0.448 | 0.505 | 0.699 | 0.268 | 0.296 | 0.673 | 0.658 | 0.71 |
| Qwen/Qwen2.5-7B | 0.664 | 0.455 | 0.654 | 0.452 | 0.29 | 0.53 | 0.494 | 0.54 |
| Model (# parameters ↓) | belebele-en | MMLU-Redux | winogrande |
|---|---|---|---|
| utter-project/EuroLLM-9B | 0.773 | 0.557 | 0.732 |
| mistralai/Ministral-3-8B-Base-2512 | 0.897 | 0.729 | 0.771 |
| swiss-ai/Apertus-8B-2509 | 0.827 | 0.598 | 0.761 |
| meta-llama/Llama-3.1-8B | 0.873 | 0.649 | 0.785 |
| tartuNLP/Apertus-EstLLM-8B-1125 | 0.843 | 0.625 | 0.763 |
| tartuNLP/Llama-3.1-EstLLM-8B-0525 | 0.87 | 0.627 | 0.766 |
| tartuNLP/Llammas-base | 0.45 | 0.35 | 0.72 |
| BSC-LT/salamandra-7b | 0.531 | 0.449 | 0.706 |
| Qwen/Qwen2.5-7B | 0.912 | 0.75 | 0.751 |
| Model (# parameters ↓) | flores en→et (BLEU) | flores et→en (BLEU) |
|---|---|---|
| utter-project/EuroLLM-9B | 29.0 | 41.2 |
| mistralai/Ministral-3-8B-Base-2512 | 12.6 | 29.6 |
| swiss-ai/Apertus-8B-2509 | 25.0 | 38.5 |
| meta-llama/Llama-3.1-8B | 13.5 | 33.7 |
| tartuNLP/Apertus-EstLLM-8B-1125 | 27.4 | 37.4 |
| tartuNLP/Llama-3.1-EstLLM-8B-0525 | 28.1 | 36.8 |
| tartuNLP/Llammas-base | 22.0 | 32.7 |
| BSC-LT/salamandra-7b | 14.7 | 18.2 |
| Qwen/Qwen2.5-7B | 5.1 | 27.5 |
In addition to the limitations of the original Apertus 8B model, this model has the following:
@misc{dorkin2026estllmenhancingestoniancapabilities,
title={{EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training}},
author={Aleksei Dorkin and Taido Purason and Emil Kalbaliyev and Hele-Andra Kuulmets and Marii Ojastu and Mark Fišel and Tanel Alumäe and Eleri Aedmaa and Krister Kruusmaa and Kairit Sirts},
year={2026},
eprint={2603.02041},
archivePrefix={arXiv},
primaryClass={cs.CL},
url={https://arxiv.org/abs/2603.02041},
}
Base model
swiss-ai/Apertus-8B-2509