README.md · AlekseyCalvin/FLUX2_dev_2bit

File size: 4,530 Bytes

---
license: other
license_name: flux-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt
language:
- en
base_model: black-forest-labs/FLUX.2-dev
tags:
- image-generation
- image-editing
- text-to-image
- flux2
- flux
- diffusers
- transformers
- flux
- quantization
- hqq
- optimization
- quantized
- gguf
- 2bit
pipeline_tag: image-to-image
library_name: diffusers
base_model_relation: quantized
---
## FLUX.2-dev 2-bit HQQ (Half-Quadratic Quantization)

2-bit quantized variant of [Flux.2-Dev by Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-dev) compacted using the [HQQ](https://github.com/dropbox/hqq) toolkit. <br>
All of the linear layers in the Transformer and Text Encoder (Mistral3-small) components have been replaced with HQQ-reapproximated weights. <br>
To use, make sure to install the following libraries:
```
pip install git+https://github.com/huggingface/diffusers.git@main
pip install transformers>=4.53.1
pip install -U hqq
pip install accelerate huggingface_hub safetensors
```
Plus `torch`, naturally, however you might compile/install it for your device.
# INFERENCE
*(Sorry, but you may have to re-construct thee pipe on-thee-fly, as they say...)*

```
import torch
import hqq
from diffusers import Flux2Pipeline, Flux2Transformer2DModel
from transformers import AutoModel
from hqq.core.quantize import HQQLinear, BaseQuantizeConfig
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

def replace_with_hqq(model, quant_config):
    """
    Recursively replaces nn.Linear layers with HQQLinear layers.
    This must match the exact logic used during quantization.
    """
    for name, child in model.named_children():
        if isinstance(child, torch.nn.Linear):
            # Create empty HQQ layer
            hqq_layer = HQQLinear(
                child, 
                quant_config=quant_config, 
                compute_dtype=torch.bfloat16, 
                device="cuda", 
                initialize=False
            )
            setattr(model, name, hqq_layer)
        else:
            replace_with_hqq(child, quant_config)

hqq_config = BaseQuantizeConfig(
    nbits=2,
    group_size=64,
    axis=1 
)

model_id = "AlekseyCalvin/FLUX2_dev_2bit_hqq"

print("Loading Text Encoder (Mistral)...")
# Initialize skeleton
text_encoder = AutoModel.from_pretrained(
    "black-forest-labs/FLUX.2-dev", # Load config from base model
    subfolder="text_encoder",
    torch_dtype=torch.bfloat16
)
# Swap layers
replace_with_hqq(text_encoder, hqq_config)
# Load quantized weights
te_path = hf_hub_download(model_id, filename="text_encoder/model.safetensors")
te_state_dict = load_file(te_path)
text_encoder.load_state_dict(te_state_dict)
text_encoder = text_encoder.to("cuda")

print("Loading Transformer (Flux 2)...")
# Initialize skeleton
transformer = Flux2Transformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.2-dev", 
    subfolder="transformer",
    torch_dtype=torch.bfloat16
)
# Swap layers
replace_with_hqq(transformer, hqq_config)
# Load quantized weights
tr_path = hf_hub_download(model_id, filename="transformer/diffusion_pytorch_model.safetensors")
tr_state_dict = load_file(tr_path)
transformer.load_state_dict(tr_state_dict)
transformer = transformer.to("cuda")

print("Assembling Pipeline...")
pipe = Flux2Pipeline.from_pretrained(
    "black-forest-labs/FLUX.2-dev",
    transformer=transformer,
    text_encoder=text_encoder,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

print("Ready for Inference!")
prompt = "A photo of a sneaky koala hiding behind book stacks at a library, calm snowy landscape visible through large window in the backdrop..."
image = pipe(prompt, guidance_scale=4, num_inference_steps=40).images[0]
image.save("KoalaTesting.png")
```
If the above doesn't work, try the inference method at the [HQQ Git Repo](https://github.com/dropbox/hqq)... <br>
If neither works, please leave comment. I will do more testing soon and revise, if need be. <br>
Crucially: HQQ should work with PEFT/LoRA inference + training. <br>

# MORE INFO:
[HQQ doc](https://huggingface.co/docs/transformers/en/quantization/hqq) at HugingFace. <br>
[HQQ git repo](https://github.com/dropbox/hqq?tab=readme-ov-file) with further info and code. <br>
[Blog post about HQQ](https://dropbox.tech/machine-learning/halfquadratic-quantization-of-large-machine-learning-models) originally published by the Mobius team (reposted under Dropbox.tech)<br>