metadata
license: apache-2.0
language:
- en
- es
- fr
- de
- it
- pt
- ru
- ar
- hi
- ko
- zh
library_name: transformers
base_model:
- arcee-ai/Trinity-Nano-Base-Pre-Anneal
Trinity Nano Base
Trinity Nano is an Arcee AI 6B MoE model with 1B active parameters. It is the small-sized model in our new Trinity family, a series of open-weight models for enterprise and tinkerers alike.
This base model pre fine tuning, and so is not suitable for chatting, and should be trained for your specific domain before use.
Trinity Nano is trained on 10T tokens gathered and curated through a key partnership with Datology, building upon the excellent dataset we used on AFM-4.5B with additional math and code.
Training was performed on a cluster of 512 H200 GPUs powered by Prime Intellect using HSDP parallelism.
More details, including key architecture decisions, can be found on our blog here
Model Details
- Model Architecture: AfmoeForCausalLM
- Parameters: 6B, 1B active
- Experts: 128 total, 8 active, 1 shared
- Context length: 128k
- Training Tokens: 10T
- License: Apache 2.0
Benchmarks
π’ Math & Reasoning
| Benchmark | Score |
|---|---|
| GSM8K | 58.4% |
| Minerva Math 500 | 36.0% |
| DROP (0-shot) | 4.5% |
| DROP (5-shot) | 63.6% |
π» Code Generation
| Benchmark | Pass@1 | Pass@10 |
|---|---|---|
| HumanEval (3-shot, bpb) | 36.3% (bpb) | - |
| HumanEval+ (temp 0.8) | 31.7% | - |
| MBPP+ | 44.7% | - |
π§ Knowledge & Reasoning
| Benchmark | 5-shot | 0-shot |
|---|---|---|
| ARC-Challenge | 84.0% | 78.2% |
| ARC-Easy | 94.8% | 91.2% |
| CommonsenseQA | 74.9% | 62.7% |
| OpenBookQA | 82.2% | 75.2% |
| WinoGrande | 72.8% | 68.0% |
| MMLU | 67.7% | 64.2% |
| MMLU Pro | 35.8% | 27.7% |
| AGI Eval (English) | 51.8% | - |
| BBH (CoT) | 50.4% | 7.6% |
π Understanding & QA
| Benchmark | Score |
|---|---|
| BoolQ (5-shot) | 84.3% |
| HellaSwag (5-shot) | 77.4% |
| PIQA (5-shot) | 82.2% |
| SciQ (5-shot) | 93.2% |
| Social IQA (5-shot) | 73.0% |
License
Trinity-Mini-Base is released under the Apache-2.0 license.