GPT-Hilberg-1M

The present is a 1M-parameter GPT autoregressive language model trained on the July 20, 2025 English Wikipedia dump for experiments on entropy scaling and Hilberg conjecture. For more information on this, you can check here. Dataset available here.

This model is part of the following suite:

  • GPT-Hilberg-1M (128 ctx)
  • GPT-Hilberg-5M (512 ctx)
  • GPT-Hilberg-10M (1024 ctx)

Architecture details

  • Parameters: ~1M
  • Layers: 4
  • Model dimension: 64
  • Attention heads: 4
  • FF dimension: 256
  • Context length: 128
  • Vocabulary size: 16,000

Training details

  • Dataset: English Wikipedia (2025-07 dump)
  • Tokenizer: SentencePiece BPE
  • Purpose: Autoregression
  • Optimizer: AdamW
Downloads last month
-
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support