YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient - Checkpoints
This repo provides the inference code for running models used in https://arxiv.org/abs/2502.05172 (accepted to ICML 2025).
The checkpoints are structured as ./<dmodel>/<n_experts>/<n_att_heads>/<n_training_steps>.pt.
Running the models
Example code loading a model from a path and running evaluations can be found in ./benchmark.py.
Citation
@misc{ludziejewski2025jointmoescalinglaws,
title={Joint MoE Scaling Laws: Mixture of Experts Can Be Memory Efficient},
author={Jan Ludziejewski and Maciej Pióro and Jakub Krajewski and Maciej Stefaniak and Michał Krutul and Jan Małaśnicki and Marek Cygan and Piotr Sankowski and Kamil Adamczewski and Piotr Miłoś and Sebastian Jaszczur},
year={2025},
eprint={2502.05172},
archivePrefix={arXiv},
primaryClass={cs.LG},
url={https://arxiv.org/abs/2502.05172},
}
Inference Providers
NEW
This model isn't deployed by any Inference Provider.
🙋
Ask for provider support