--- license: other license_name: flux-dev-non-commercial-license license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt language: - en base_model: black-forest-labs/FLUX.2-dev tags: - image-generation - image-editing - text-to-image - flux2 - flux - diffusers - transformers - flux - quantization - hqq - optimization - quantized - gguf - 2bit pipeline_tag: image-to-image library_name: diffusers base_model_relation: quantized --- ## FLUX.2-dev 2-bit HQQ (Half-Quadratic Quantization) 2-bit quantized variant of [Flux.2-Dev by Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-dev) compacted using the [HQQ](https://github.com/dropbox/hqq) toolkit.
All of the linear layers in the Transformer and Text Encoder (Mistral3-small) components have been replaced with HQQ-reapproximated weights.
To use, make sure to install the following libraries: ``` pip install git+https://github.com/huggingface/diffusers.git@main pip install transformers>=4.53.1 pip install -U hqq pip install accelerate huggingface_hub safetensors ``` Plus `torch`, naturally, however you might compile/install it for your device. # INFERENCE *(Sorry, but you may have to re-construct thee pipe on-thee-fly, as they say...)* ``` import torch import hqq from diffusers import Flux2Pipeline, Flux2Transformer2DModel from transformers import AutoModel from hqq.core.quantize import HQQLinear, BaseQuantizeConfig from huggingface_hub import hf_hub_download from safetensors.torch import load_file def replace_with_hqq(model, quant_config): """ Recursively replaces nn.Linear layers with HQQLinear layers. This must match the exact logic used during quantization. """ for name, child in model.named_children(): if isinstance(child, torch.nn.Linear): # Create empty HQQ layer hqq_layer = HQQLinear( child, quant_config=quant_config, compute_dtype=torch.bfloat16, device="cuda", initialize=False ) setattr(model, name, hqq_layer) else: replace_with_hqq(child, quant_config) hqq_config = BaseQuantizeConfig( nbits=2, group_size=64, axis=1 ) model_id = "AlekseyCalvin/FLUX2_dev_2bit_hqq" print("Loading Text Encoder (Mistral)...") # Initialize skeleton text_encoder = AutoModel.from_pretrained( "black-forest-labs/FLUX.2-dev", # Load config from base model subfolder="text_encoder", torch_dtype=torch.bfloat16 ) # Swap layers replace_with_hqq(text_encoder, hqq_config) # Load quantized weights te_path = hf_hub_download(model_id, filename="text_encoder/model.safetensors") te_state_dict = load_file(te_path) text_encoder.load_state_dict(te_state_dict) text_encoder = text_encoder.to("cuda") print("Loading Transformer (Flux 2)...") # Initialize skeleton transformer = Flux2Transformer2DModel.from_pretrained( "black-forest-labs/FLUX.2-dev", subfolder="transformer", torch_dtype=torch.bfloat16 ) # Swap layers replace_with_hqq(transformer, hqq_config) # Load quantized weights tr_path = hf_hub_download(model_id, filename="transformer/diffusion_pytorch_model.safetensors") tr_state_dict = load_file(tr_path) transformer.load_state_dict(tr_state_dict) transformer = transformer.to("cuda") print("Assembling Pipeline...") pipe = Flux2Pipeline.from_pretrained( "black-forest-labs/FLUX.2-dev", transformer=transformer, text_encoder=text_encoder, torch_dtype=torch.bfloat16 ) pipe.enable_model_cpu_offload() print("Ready for Inference!") prompt = "A photo of a sneaky koala hiding behind book stacks at a library, calm snowy landscape visible through large window in the backdrop..." image = pipe(prompt, guidance_scale=4, num_inference_steps=40).images[0] image.save("KoalaTesting.png") ``` If the above doesn't work, try the inference method at the [HQQ Git Repo](https://github.com/dropbox/hqq)...
If neither works, please leave comment. I will do more testing soon and revise, if need be.
Crucially: HQQ should work with PEFT/LoRA inference + training.
# MORE INFO: [HQQ doc](https://huggingface.co/docs/transformers/en/quantization/hqq) at HugingFace.
[HQQ git repo](https://github.com/dropbox/hqq?tab=readme-ov-file) with further info and code.
[Blog post about HQQ](https://dropbox.tech/machine-learning/halfquadratic-quantization-of-large-machine-learning-models) originally published by the Mobius team (reposted under Dropbox.tech)