Apertus EstLLM 8B 1125 Base

Please note that this is a base text completion model that has not been instruction-tuned. It is intended for fine-tuning on downstream tasks rather than direct use for chat or instruction-following.

The original swiss-ai/Apertus-8B-2509 underwent continuous pre-training starting on approximately 35B tokens. Continued pre-training was performed for a single epoch on:

Estonian National Corpus (8.6B tokens)
Python-Edu (3.3B tokens)
FineMath4-Plus (9.5B tokens)
General Instruction-Augmented Corpora (7.4B tokens)
Cosmopedia v2 (6.9B tokens)

Model Details

Model Description

Developed by: TartuNLP and TalTechNLP research groups
Funded by: Estonian Ministry of Education and Research, “Estonian Language Technology Program 2018-2027”
Model type: Causal Language Model
Language(s) (NLP): Estonian, English
License: Apache 2.0
Finetuned from model: swiss-ai/Apertus-8B-2509

Evaluation

Logits-based

Estonian

Model (# parameters ↓)	belebele-et	exam-et	grammar-et	inflection-et	trivia-et	winogrande-et	xcopa-et	GlobalPIQA-et
utter-project/EuroLLM-9B	0.699	0.618	0.663	0.44	0.371	0.692	0.712	0.69
mistralai/Ministral-3-8B-Base-2512	0.263	0.528	0.641	0.585	0.316	0.623	0.56	0.6
swiss-ai/Apertus-8B-2509	0.768	0.607	0.789	0.478	0.329	0.711	0.678	0.73
meta-llama/Llama-3.1-8B	0.67	0.447	0.658	0.587	0.3	0.596	0.532	0.53
tartuNLP/Apertus-EstLLM-8B-1125	0.788	0.636	0.834	0.523	0.389	0.752	0.73	0.79
tartuNLP/Llama-3.1-EstLLM-8B-0525	0.772	0.57	0.875	0.619	0.449	0.74	0.752	0.78
Llammas-base	0.387	0.462	0.538	0.269	0.336	0.697	0.686	0.76
BSC-LT/salamandra-7b	0.448	0.505	0.699	0.268	0.296	0.673	0.658	0.71
Qwen/Qwen2.5-7B	0.664	0.455	0.654	0.452	0.29	0.53	0.494	0.54

English

Model (# parameters ↓)	belebele-en	MMLU-Redux	winogrande
utter-project/EuroLLM-9B	0.773	0.557	0.732
mistralai/Ministral-3-8B-Base-2512	0.897	0.729	0.771
swiss-ai/Apertus-8B-2509	0.827	0.598	0.761
meta-llama/Llama-3.1-8B	0.873	0.649	0.785
tartuNLP/Apertus-EstLLM-8B-1125	0.843	0.625	0.763
tartuNLP/Llama-3.1-EstLLM-8B-0525	0.87	0.627	0.766
tartuNLP/Llammas-base	0.45	0.35	0.72
BSC-LT/salamandra-7b	0.531	0.449	0.706
Qwen/Qwen2.5-7B	0.912	0.75	0.751

Translation

Model (# parameters ↓)	flores en→et (BLEU)	flores et→en (BLEU)
utter-project/EuroLLM-9B	29.0	41.2
mistralai/Ministral-3-8B-Base-2512	12.6	29.6
swiss-ai/Apertus-8B-2509	25.0	38.5
meta-llama/Llama-3.1-8B	13.5	33.7
tartuNLP/Apertus-EstLLM-8B-1125	27.4	37.4
tartuNLP/Llama-3.1-EstLLM-8B-0525	28.1	36.8
tartuNLP/Llammas-base	22.0	32.7
BSC-LT/salamandra-7b	14.7	18.2
Qwen/Qwen2.5-7B	5.1	27.5

Limitations

In addition to the limitations of the original Apertus 8B model, this model has the following:

Somewhat limited context size due to the continued training being done with the sequence length of 4096 tokens.

Citation

@misc{dorkin2026estllmenhancingestoniancapabilities,
      title={{EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training}}, 
      author={Aleksei Dorkin and Taido Purason and Emil Kalbaliyev and Hele-Andra Kuulmets and Marii Ojastu and Mark Fišel and Tanel Alumäe and Eleri Aedmaa and Krister Kruusmaa and Kairit Sirts},
      year={2026},
      eprint={2603.02041},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.02041}, 
}