z-image-turbo (GGUF Version) for ComfyUI

This repository contains the GGUF quantized weights for the z-image-turbo model, optimized to run in environments with limited VRAM resources (though still demanding) using ComfyUI.

The goal of this upload is to enable the execution of this pipeline by leveraging the efficiency of the GGUF format for both the UNET and the Text Encoder (Qwen). Feel free to download and use just the workflow (json) in the models tab and versions!

ezgif-8e3c09e7876162d0

Example Comparison ILKCwkG5LkjF2ZrAXXRbJ (1) EpzgxY40FbLEE3oGUBDIi (2) gNM6MhM7HPIKAj7YZ2bHz (3)

πŸ“‚ Files and Structure

The models are organized within the repository folders as follows:

  • UNET: models/unet/z_image_turbo-Q8_0.gguf

    • Q8 quantized version of the main diffusion model.
  • Text Encoder: models/text_encoders/Qwen3-4B-UD-Q5_K_XL.gguf

    • Qwen3 4B LLM quantized in Q5, used for prompt processing.
  • VAE: models/vae/ae.safetensors

    • Standard Variational Autoencoder for decoding the image.
  • Feel free to download and use just the workflow!

βš™οΈ Installation in ComfyUI

To use these models, you will need custom Nodes that support GGUF loading (such as City96's ComfyUI-GGUF or similar).

  1. Download the files:

    • Move the .gguf file from the unet folder to: ComfyUI/models/unet/
    • Move the .gguf file from the text_encoders folder to: ComfyUI/models/clip/ (or text_encoders depending on your node loader).
    • Move the .safetensors file from the vae folder to: ComfyUI/models/vae/
  2. Recommended Nodes:

    • Use UnetLoaderGGUF to load z_image_turbo-Q8_0.gguf.
    • Use a GGUF-compatible CLIP/Text Encoder Loader to load Qwen3-4B.

πŸ’» Hardware Requirements and Performance

⚠️ Warning: This is a heavy workflow.

Even with GGUF quantization, the model requires considerable hardware due to the size of the text encoder and the UNET.

  • Minimum GPU: 12GB VRAM (NVIDIA RTX 3060/4070 or higher).
  • System RAM: 32GB Recommended (System may need to offload data to RAM).

Generation Time

On a GPU with 12GB VRAM, the estimated generation time per image ranges between:

  • 15 to 30 seconds. making considerably fast!

Model Information Check out the original model card Z-Image Turbo for detailed information about the model.

πŸ”— Useful Links


Downloads last month
324
GGUF
Model size
4B params
Architecture
qwen3
Hardware compatibility
Log In to view the estimation

5-bit

8-bit

Inference Providers NEW
This model isn't deployed by any Inference Provider. πŸ™‹ Ask for provider support