Instructions to use neuralnets/fractal_r1_4bit_q with libraries, inference providers, notebooks, and local apps. Follow these links to get started.

Libraries

How to use neuralnets/fractal_r1_4bit_q with Transformers:

# Use a pipeline as a high-level helper
from transformers import pipeline

pipe = pipeline("text-generation", model="neuralnets/fractal_r1_4bit_q")
messages = [
    {"role": "user", "content": "Who are you?"},
]
pipe(messages)

# Load model directly
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained("neuralnets/fractal_r1_4bit_q")
model = AutoModelForCausalLM.from_pretrained("neuralnets/fractal_r1_4bit_q")
messages = [
    {"role": "user", "content": "Who are you?"},
]
inputs = tokenizer.apply_chat_template(
	messages,
	add_generation_prompt=True,
	tokenize=True,
	return_dict=True,
	return_tensors="pt",
).to(model.device)

outputs = model.generate(**inputs, max_new_tokens=40)
print(tokenizer.decode(outputs[0][inputs["input_ids"].shape[-1]:]))

Notebooks
Google Colab
Kaggle
Local Apps Settings

vLLM

How to use neuralnets/fractal_r1_4bit_q with vLLM:

Install from pip and serve model

# Install vLLM from pip:
pip install vllm
# Start the vLLM server:
vllm serve "neuralnets/fractal_r1_4bit_q"
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:8000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "neuralnets/fractal_r1_4bit_q",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker

docker model run hf.co/neuralnets/fractal_r1_4bit_q

SGLang

How to use neuralnets/fractal_r1_4bit_q with SGLang:

Install from pip and serve model

# Install SGLang from pip:
pip install sglang
# Start the SGLang server:
python3 -m sglang.launch_server \
    --model-path "neuralnets/fractal_r1_4bit_q" \
    --host 0.0.0.0 \
    --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "neuralnets/fractal_r1_4bit_q",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Use Docker images

docker run --gpus all \
    --shm-size 32g \
    -p 30000:30000 \
    -v ~/.cache/huggingface:/root/.cache/huggingface \
    --env "HF_TOKEN=<secret>" \
    --ipc=host \
    lmsysorg/sglang:latest \
    python3 -m sglang.launch_server \
        --model-path "neuralnets/fractal_r1_4bit_q" \
        --host 0.0.0.0 \
        --port 30000
# Call the server using curl (OpenAI-compatible API):
curl -X POST "http://localhost:30000/v1/chat/completions" \
	-H "Content-Type: application/json" \
	--data '{
		"model": "neuralnets/fractal_r1_4bit_q",
		"messages": [
			{
				"role": "user",
				"content": "What is the capital of France?"
			}
		]
	}'

Docker Model Runner
How to use neuralnets/fractal_r1_4bit_q with Docker Model Runner:
```
docker model run hf.co/neuralnets/fractal_r1_4bit_q
```

Model is not supporting by TGI

by maheshbabu9199 - opened May 27, 2025

Discussion

maheshbabu9199

May 27, 2025

•

edited May 27, 2025

when trying to load this model using TGI, am getting the following which says

RuntimeError: [FT][ERROR] Invalid shape for quantized tensor. Number of rows of quantized matrix must be a multiple of 16 Assertion fail: /build/source/cutlass_kernels/cutlass_preprocessors.cc:164
text-generation-inference | 2025-05-27T08:01:55.437641Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output.

Anyone working on TGI, with this model.?

maheshbabu9199 changed discussion title from model is supported by tgi to Model is supporting by TGI May 27, 2025

maheshbabu9199 changed discussion title from Model is supporting by TGI to Model is not supporting by TGI May 27, 2025

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment