Text Generation
Transformers
Safetensors
Estonian
English
apertus

image/png

Apertus EstLLM 8B 1125 Base

Please note that this is a base text completion model that has not been instruction-tuned. It is intended for fine-tuning on downstream tasks rather than direct use for chat or instruction-following.

The original swiss-ai/Apertus-8B-2509 underwent continuous pre-training starting on approximately 35B tokens. Continued pre-training was performed for a single epoch on:

  • Estonian National Corpus (8.6B tokens)
  • Python-Edu (3.3B tokens)
  • FineMath4-Plus (9.5B tokens)
  • General Instruction-Augmented Corpora (7.4B tokens)
  • Cosmopedia v2 (6.9B tokens)

Model Details

Model Description

  • Developed by: TartuNLP and TalTechNLP research groups
  • Funded by: Estonian Ministry of Education and Research, “Estonian Language Technology Program 2018-2027”
  • Model type: Causal Language Model
  • Language(s) (NLP): Estonian, English
  • License: Apache 2.0
  • Finetuned from model: swiss-ai/Apertus-8B-2509

Evaluation

Logits-based

Estonian

Model (# parameters ↓) belebele-et exam-et grammar-et inflection-et trivia-et winogrande-et xcopa-et GlobalPIQA-et
utter-project/EuroLLM-9B 0.699 0.618 0.663 0.44 0.371 0.692 0.712 0.69
mistralai/Ministral-3-8B-Base-2512 0.263 0.528 0.641 0.585 0.316 0.623 0.56 0.6
swiss-ai/Apertus-8B-2509 0.768 0.607 0.789 0.478 0.329 0.711 0.678 0.73
meta-llama/Llama-3.1-8B 0.67 0.447 0.658 0.587 0.3 0.596 0.532 0.53
tartuNLP/Apertus-EstLLM-8B-1125 0.788 0.636 0.834 0.523 0.389 0.752 0.73 0.79
tartuNLP/Llama-3.1-EstLLM-8B-0525 0.772 0.57 0.875 0.619 0.449 0.74 0.752 0.78
Llammas-base 0.387 0.462 0.538 0.269 0.336 0.697 0.686 0.76
BSC-LT/salamandra-7b 0.448 0.505 0.699 0.268 0.296 0.673 0.658 0.71
Qwen/Qwen2.5-7B 0.664 0.455 0.654 0.452 0.29 0.53 0.494 0.54

English

Model (# parameters ↓) belebele-en MMLU-Redux winogrande
utter-project/EuroLLM-9B 0.773 0.557 0.732
mistralai/Ministral-3-8B-Base-2512 0.897 0.729 0.771
swiss-ai/Apertus-8B-2509 0.827 0.598 0.761
meta-llama/Llama-3.1-8B 0.873 0.649 0.785
tartuNLP/Apertus-EstLLM-8B-1125 0.843 0.625 0.763
tartuNLP/Llama-3.1-EstLLM-8B-0525 0.87 0.627 0.766
tartuNLP/Llammas-base 0.45 0.35 0.72
BSC-LT/salamandra-7b 0.531 0.449 0.706
Qwen/Qwen2.5-7B 0.912 0.75 0.751

Translation

Model (# parameters ↓) flores en→et (BLEU) flores et→en (BLEU)
utter-project/EuroLLM-9B 29.0 41.2
mistralai/Ministral-3-8B-Base-2512 12.6 29.6
swiss-ai/Apertus-8B-2509 25.0 38.5
meta-llama/Llama-3.1-8B 13.5 33.7
tartuNLP/Apertus-EstLLM-8B-1125 27.4 37.4
tartuNLP/Llama-3.1-EstLLM-8B-0525 28.1 36.8
tartuNLP/Llammas-base 22.0 32.7
BSC-LT/salamandra-7b 14.7 18.2
Qwen/Qwen2.5-7B 5.1 27.5

Limitations

In addition to the limitations of the original Apertus 8B model, this model has the following:

  • Somewhat limited context size due to the continued training being done with the sequence length of 4096 tokens.

Citation

@misc{dorkin2026estllmenhancingestoniancapabilities,
      title={{EstLLM: Enhancing Estonian Capabilities in Multilingual LLMs via Continued Pretraining and Post-Training}}, 
      author={Aleksei Dorkin and Taido Purason and Emil Kalbaliyev and Hele-Andra Kuulmets and Marii Ojastu and Mark Fišel and Tanel Alumäe and Eleri Aedmaa and Krister Kruusmaa and Kairit Sirts},
      year={2026},
      eprint={2603.02041},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2603.02041}, 
}
Downloads last month
450
Safetensors
Model size
8B params
Tensor type
BF16
·
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support

Model tree for tartuNLP/Apertus-EstLLM-8B-1125

Finetuned
(15)
this model
Finetunes
1 model
Quantizations
2 models

Datasets used to train tartuNLP/Apertus-EstLLM-8B-1125

Paper for tartuNLP/Apertus-EstLLM-8B-1125