GPT-Hilberg-1M
The present is a 1M-parameter GPT autoregressive language model trained on the July 20, 2025 English Wikipedia dump for experiments on entropy scaling and Hilberg conjecture. For more information on this, you can check here. Dataset available here.
This model is part of the following suite:
- GPT-Hilberg-1M (128 ctx)
- GPT-Hilberg-5M (512 ctx)
- GPT-Hilberg-10M (1024 ctx)
Architecture details
- Parameters: ~1M
- Layers: 4
- Model dimension: 64
- Attention heads: 4
- FF dimension: 256
- Context length: 128
- Vocabulary size: 16,000
Training details
- Dataset: English Wikipedia (2025-07 dump)
- Tokenizer: SentencePiece BPE
- Purpose: Autoregression
- Optimizer: AdamW
- Downloads last month
- -
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support