AdrianRasoOnHF
/

gpt-hilberg-1M

information-theory

Model card Files Files and versions

GPT-Hilberg-1M

The present is a 1M-parameter GPT autoregressive language model trained on the July 20, 2025 English Wikipedia dump for experiments on entropy scaling and Hilberg conjecture. For more information on this, you can check here. Dataset available here.

This model is part of the following suite:

GPT-Hilberg-1M (128 ctx)
GPT-Hilberg-5M (512 ctx)
GPT-Hilberg-10M (1024 ctx)

Architecture details

Parameters: ~1M
Layers: 4
Model dimension: 64
Attention heads: 4
FF dimension: 256
Context length: 128
Vocabulary size: 16,000

Training details

Dataset: English Wikipedia (2025-07 dump)
Tokenizer: SentencePiece BPE
Purpose: Autoregression
Optimizer: AdamW

Downloads last month: -

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support