File size: 4,530 Bytes
b83f56c
b81b4c0
 
 
 
 
b83f56c
b81b4c0
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
76b0eff
b81b4c0
595dffd
76b0eff
 
 
 
 
4be4292
76b0eff
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
b81b4c0
 
229d877
 
 
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
---
license: other
license_name: flux-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt
language:
- en
base_model: black-forest-labs/FLUX.2-dev
tags:
- image-generation
- image-editing
- text-to-image
- flux2
- flux
- diffusers
- transformers
- flux
- quantization
- hqq
- optimization
- quantized
- gguf
- 2bit
pipeline_tag: image-to-image
library_name: diffusers
base_model_relation: quantized
---
## FLUX.2-dev 2-bit HQQ (Half-Quadratic Quantization)

2-bit quantized variant of [Flux.2-Dev by Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-dev) compacted using the [HQQ](https://github.com/dropbox/hqq) toolkit. <br>
All of the linear layers in the Transformer and Text Encoder (Mistral3-small) components have been replaced with HQQ-reapproximated weights. <br>
To use, make sure to install the following libraries:
```
pip install git+https://github.com/huggingface/diffusers.git@main
pip install transformers>=4.53.1
pip install -U hqq
pip install accelerate huggingface_hub safetensors
```
Plus `torch`, naturally, however you might compile/install it for your device.
# INFERENCE
*(Sorry, but you may have to re-construct thee pipe on-thee-fly, as they say...)*

```
import torch
import hqq
from diffusers import Flux2Pipeline, Flux2Transformer2DModel
from transformers import AutoModel
from hqq.core.quantize import HQQLinear, BaseQuantizeConfig
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file

def replace_with_hqq(model, quant_config):
    """
    Recursively replaces nn.Linear layers with HQQLinear layers.
    This must match the exact logic used during quantization.
    """
    for name, child in model.named_children():
        if isinstance(child, torch.nn.Linear):
            # Create empty HQQ layer
            hqq_layer = HQQLinear(
                child, 
                quant_config=quant_config, 
                compute_dtype=torch.bfloat16, 
                device="cuda", 
                initialize=False
            )
            setattr(model, name, hqq_layer)
        else:
            replace_with_hqq(child, quant_config)

hqq_config = BaseQuantizeConfig(
    nbits=2,
    group_size=64,
    axis=1 
)

model_id = "AlekseyCalvin/FLUX2_dev_2bit_hqq"

print("Loading Text Encoder (Mistral)...")
# Initialize skeleton
text_encoder = AutoModel.from_pretrained(
    "black-forest-labs/FLUX.2-dev", # Load config from base model
    subfolder="text_encoder",
    torch_dtype=torch.bfloat16
)
# Swap layers
replace_with_hqq(text_encoder, hqq_config)
# Load quantized weights
te_path = hf_hub_download(model_id, filename="text_encoder/model.safetensors")
te_state_dict = load_file(te_path)
text_encoder.load_state_dict(te_state_dict)
text_encoder = text_encoder.to("cuda")

print("Loading Transformer (Flux 2)...")
# Initialize skeleton
transformer = Flux2Transformer2DModel.from_pretrained(
    "black-forest-labs/FLUX.2-dev", 
    subfolder="transformer",
    torch_dtype=torch.bfloat16
)
# Swap layers
replace_with_hqq(transformer, hqq_config)
# Load quantized weights
tr_path = hf_hub_download(model_id, filename="transformer/diffusion_pytorch_model.safetensors")
tr_state_dict = load_file(tr_path)
transformer.load_state_dict(tr_state_dict)
transformer = transformer.to("cuda")

print("Assembling Pipeline...")
pipe = Flux2Pipeline.from_pretrained(
    "black-forest-labs/FLUX.2-dev",
    transformer=transformer,
    text_encoder=text_encoder,
    torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()

print("Ready for Inference!")
prompt = "A photo of a sneaky koala hiding behind book stacks at a library, calm snowy landscape visible through large window in the backdrop..."
image = pipe(prompt, guidance_scale=4, num_inference_steps=40).images[0]
image.save("KoalaTesting.png")
```
If the above doesn't work, try the inference method at the [HQQ Git Repo](https://github.com/dropbox/hqq)... <br>
If neither works, please leave comment. I will do more testing soon and revise, if need be. <br>
Crucially: HQQ should work with PEFT/LoRA inference + training. <br>

# MORE INFO:
[HQQ doc](https://huggingface.co/docs/transformers/en/quantization/hqq) at HugingFace. <br>
[HQQ git repo](https://github.com/dropbox/hqq?tab=readme-ov-file) with further info and code. <br>
[Blog post about HQQ](https://dropbox.tech/machine-learning/halfquadratic-quantization-of-large-machine-learning-models) originally published by the Mobius team (reposted under Dropbox.tech)<br>