YAML Metadata Warning: empty or missing yaml metadata in repo card
Check out the documentation for more information.
MiniMax-M2.5-Uncensored-4bit
A 4-bit quantized, uncensored variant of MiniMax-M2.5, built from vpyn/MiniMax-M2.5-CARVE-v1-BF16.
Optimized for the MLX ecosystem on Apple Silicon, this repo works especially well with mlx-openai-server, an OpenAI-compatible local inference server for MLX models with support for reasoning parsers, tool-call parsers, streaming, and multi-model serving.
Model description
This model is derived from the official MiniMax-M2.5 (MiniMaxAI/MiniMax-M2.5), a 229B-parameter frontier model strong in coding, tool use, search, and office tasks. The uncensored base MiniMax-M2.5-CARVE-v1-BF16 was created via CARVE-style uncensoring; this repository provides a 4-bit quantized version for lower memory use and faster inference while retaining the same architecture and chat format.
- Architecture: MiniMaxM2ForCausalLM (
minimax_m2) - Layers: 62
- Hidden size: 3072
- Vocabulary size: 200,064
- Quantization: 4-bit (reduced size vs. BF16)
- Base model: MiniMaxAI/MiniMax-M2.5
- Uncensored base: vpyn/MiniMax-M2.5-CARVE-v1-BF16
Intended use
This model is intended for users who want:
- The capabilities of MiniMax-M2.5 in a smaller, faster 4-bit form.
- An uncensored variant for research, creative, or uncensored-assistant use cases.
Disclaimer: This is an uncensored model. Outputs can be unfiltered. Use responsibly and in accordance with your local laws and policies.
Installation
The recommended way to run this model is with mlx-openai-server on Apple Silicon.
Install mlx-openai-server
python3 -m venv .venv
source .venv/bin/activate
pip install -U mlx-openai-server
To use the latest development version instead:
pip install -U git+https://github.com/cubist38/mlx-openai-server.git
Launch the model
mlx-openai-server launch --model-path MiniMax-M2.5-Uncensored-4bit --model-type lm --reasoning-parser minimax_m2 --tool-call-parser minimax_m2 --trust-remote-code
This exposes an OpenAI-compatible API at http://localhost:8000/v1, making the model easy to use from existing OpenAI SDKs, apps, and agent frameworks.
How to use
This is an MLX model for Apple Silicon. The recommended serving path is mlx-openai-server, and you can also run it directly with mlx_lm.
Python (mlx_lm)
from mlx_lm import load, generate
model_path = "MiniMax-M2.5-Uncensored-4bit" # or your local path
model, tokenizer = load(
model_path,
tokenizer_config={"trust_remote_code": True},
)
response = generate(
model,
tokenizer,
prompt="Hello, how are you?",
max_tokens=256,
temp=1.0,
top_p=0.95,
verbose=True,
)
print(response)
For chat, format messages with the model’s chat template (e.g. using the repo’s chat_template.jinja) before passing the resulting string as prompt.
Server (mlx-openai-server)
mlx-openai-server launch --model-path MiniMax-M2.5-Uncensored-4bit --model-type lm --reasoning-parser minimax_m2 --tool-call-parser minimax_m2 --trust-remote-code
mlx-openai-server is the best fit if you want OpenAI-compatible endpoints, streaming responses, structured outputs, reasoning/tool-call parsing, and easy integration with existing clients.
Acknowledgments
- MiniMaxAI/MiniMax-M2.5 — Original model and model card.
- vpyn/MiniMax-M2.5-CARVE-v1-BF16 — Uncensored CARVE variant used as the base for this 4-bit version.
License
Follow the license terms of the original MiniMax-M2.5 model (Modified-MIT). See MiniMax-M2.5 and the MiniMax-M2.5 GitHub LICENSE for details.
- Downloads last month
- 45