YAML Metadata Warning: empty or missing yaml metadata in repo card (https://huggingface.co/docs/hub/model-cards#model-card-metadata)

Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient - Checkpoints

This repo provides the inference code for running models used in https://arxiv.org/abs/2502.05172 (accepted to ICML 2025). The checkpoints are structured as ./<dmodel>/<n_experts>/<n_att_heads>/<n_training_steps>.pt.

Running the models

Example code loading a model from a path and running evaluations can be found in ./benchmark.py.

Citation

@misc{ludziejewski2025jointmoescalinglaws,
      title={Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient}, 
      author={Jan Ludziejewski and Maciej Pióro and Jakub Krajewski and Maciej Stefaniak and Michał Krutul and Jan Małaśnicki and Marek Cygan and Piotr Sankowski and Kamil Adamczewski and Piotr Miłoś and Sebastian Jaszczur},
      year={2025},
      eprint={2502.05172},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2502.05172}, 
}
Downloads last month

-

Downloads are not tracked for this model. How to track
Inference Providers NEW
This model isn't deployed by any Inference Provider. 🙋 Ask for provider support