NVIDIA Nemotron Nano 2: An Accurate and Efficient Hybrid Mamba-Transformer Reasoning Model Paper • 2508.14444 • Published Aug 20, 2025 • 40
Bochkov/growing-transformers-model-frozen-16-bit-baseline-monolyth-181m Text Generation • Updated 3 days ago • 20
Bochkov/growing-transformers-model-unfrozen-baseline-monolyth-247m Text Generation • Updated 3 days ago • 10
Bochkov/growing-transformers-model-frozen-unicode-baseline-monolyth-247m Text Generation • Updated 3 days ago • 14
view article Article Emergent Semantics Beyond Token Embeddings: A GPT-like Transformer Learns with Frozen 16‑D Binary Token-ID Embeddings (n_embed=16) 6 days ago