MiniMax-M2.5-Uncensored-4bit

A 4-bit quantized, uncensored variant of MiniMax-M2.5, built from vpyn/MiniMax-M2.5-CARVE-v1-BF16.

Optimized for the MLX ecosystem on Apple Silicon, this repo works especially well with mlx-openai-server, an OpenAI-compatible local inference server for MLX models with support for reasoning parsers, tool-call parsers, streaming, and multi-model serving.

Model description

This model is derived from the official MiniMax-M2.5 (MiniMaxAI/MiniMax-M2.5), a 229B-parameter frontier model strong in coding, tool use, search, and office tasks. The uncensored base MiniMax-M2.5-CARVE-v1-BF16 was created via CARVE-style uncensoring; this repository provides a 4-bit quantized version for lower memory use and faster inference while retaining the same architecture and chat format.

Architecture: MiniMaxM2ForCausalLM (minimax_m2)
Layers: 62
Hidden size: 3072
Vocabulary size: 200,064
Quantization: 4-bit (reduced size vs. BF16)
Base model: MiniMaxAI/MiniMax-M2.5
Uncensored base: vpyn/MiniMax-M2.5-CARVE-v1-BF16

Intended use

This model is intended for users who want:

The capabilities of MiniMax-M2.5 in a smaller, faster 4-bit form.
An uncensored variant for research, creative, or uncensored-assistant use cases.

Disclaimer: This is an uncensored model. Outputs can be unfiltered. Use responsibly and in accordance with your local laws and policies.

Installation

The recommended way to run this model is with mlx-openai-server on Apple Silicon.

Install `mlx-openai-server`

python3 -m venv .venv
source .venv/bin/activate
pip install -U mlx-openai-server

To use the latest development version instead:

pip install -U git+https://github.com/cubist38/mlx-openai-server.git

Launch the model

mlx-openai-server launch --model-path MiniMax-M2.5-Uncensored-4bit --model-type lm --reasoning-parser minimax_m2 --tool-call-parser minimax_m2 --trust-remote-code

This exposes an OpenAI-compatible API at http://localhost:8000/v1, making the model easy to use from existing OpenAI SDKs, apps, and agent frameworks.

How to use

This is an MLX model for Apple Silicon. The recommended serving path is mlx-openai-server, and you can also run it directly with mlx_lm.

Python (mlx_lm)

from mlx_lm import load, generate

model_path = "MiniMax-M2.5-Uncensored-4bit"  # or your local path
model, tokenizer = load(
    model_path,
    tokenizer_config={"trust_remote_code": True},
)
response = generate(
    model,
    tokenizer,
    prompt="Hello, how are you?",
    max_tokens=256,
    temp=1.0,
    top_p=0.95,
    verbose=True,
)
print(response)

For chat, format messages with the model’s chat template (e.g. using the repo’s chat_template.jinja) before passing the resulting string as prompt.

Server (mlx-openai-server)

mlx-openai-server launch --model-path MiniMax-M2.5-Uncensored-4bit --model-type lm --reasoning-parser minimax_m2 --tool-call-parser minimax_m2 --trust-remote-code

mlx-openai-server is the best fit if you want OpenAI-compatible endpoints, streaming responses, structured outputs, reasoning/tool-call parsing, and easy integration with existing clients.

Acknowledgments

MiniMaxAI/MiniMax-M2.5 — Original model and model card.
vpyn/MiniMax-M2.5-CARVE-v1-BF16 — Uncensored CARVE variant used as the base for this 4-bit version.

License

Follow the license terms of the original MiniMax-M2.5 model (Modified-MIT). See MiniMax-M2.5 and the MiniMax-M2.5 GitHub LICENSE for details.

Downloads last month: 45

Safetensors

Model size

229B params

Tensor type

BF16

U32

Inference Providers NEW

This model isn't deployed by any Inference Provider. 🙋 Ask for provider support