File size: 4,530 Bytes
b83f56c b81b4c0 b83f56c b81b4c0 76b0eff b81b4c0 595dffd 76b0eff 4be4292 76b0eff b81b4c0 229d877 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 |
---
license: other
license_name: flux-dev-non-commercial-license
license_link: https://huggingface.co/black-forest-labs/FLUX.2-dev/blob/main/LICENSE.txt
language:
- en
base_model: black-forest-labs/FLUX.2-dev
tags:
- image-generation
- image-editing
- text-to-image
- flux2
- flux
- diffusers
- transformers
- flux
- quantization
- hqq
- optimization
- quantized
- gguf
- 2bit
pipeline_tag: image-to-image
library_name: diffusers
base_model_relation: quantized
---
## FLUX.2-dev 2-bit HQQ (Half-Quadratic Quantization)
2-bit quantized variant of [Flux.2-Dev by Black Forest Labs](https://huggingface.co/black-forest-labs/FLUX.2-dev) compacted using the [HQQ](https://github.com/dropbox/hqq) toolkit. <br>
All of the linear layers in the Transformer and Text Encoder (Mistral3-small) components have been replaced with HQQ-reapproximated weights. <br>
To use, make sure to install the following libraries:
```
pip install git+https://github.com/huggingface/diffusers.git@main
pip install transformers>=4.53.1
pip install -U hqq
pip install accelerate huggingface_hub safetensors
```
Plus `torch`, naturally, however you might compile/install it for your device.
# INFERENCE
*(Sorry, but you may have to re-construct thee pipe on-thee-fly, as they say...)*
```
import torch
import hqq
from diffusers import Flux2Pipeline, Flux2Transformer2DModel
from transformers import AutoModel
from hqq.core.quantize import HQQLinear, BaseQuantizeConfig
from huggingface_hub import hf_hub_download
from safetensors.torch import load_file
def replace_with_hqq(model, quant_config):
"""
Recursively replaces nn.Linear layers with HQQLinear layers.
This must match the exact logic used during quantization.
"""
for name, child in model.named_children():
if isinstance(child, torch.nn.Linear):
# Create empty HQQ layer
hqq_layer = HQQLinear(
child,
quant_config=quant_config,
compute_dtype=torch.bfloat16,
device="cuda",
initialize=False
)
setattr(model, name, hqq_layer)
else:
replace_with_hqq(child, quant_config)
hqq_config = BaseQuantizeConfig(
nbits=2,
group_size=64,
axis=1
)
model_id = "AlekseyCalvin/FLUX2_dev_2bit_hqq"
print("Loading Text Encoder (Mistral)...")
# Initialize skeleton
text_encoder = AutoModel.from_pretrained(
"black-forest-labs/FLUX.2-dev", # Load config from base model
subfolder="text_encoder",
torch_dtype=torch.bfloat16
)
# Swap layers
replace_with_hqq(text_encoder, hqq_config)
# Load quantized weights
te_path = hf_hub_download(model_id, filename="text_encoder/model.safetensors")
te_state_dict = load_file(te_path)
text_encoder.load_state_dict(te_state_dict)
text_encoder = text_encoder.to("cuda")
print("Loading Transformer (Flux 2)...")
# Initialize skeleton
transformer = Flux2Transformer2DModel.from_pretrained(
"black-forest-labs/FLUX.2-dev",
subfolder="transformer",
torch_dtype=torch.bfloat16
)
# Swap layers
replace_with_hqq(transformer, hqq_config)
# Load quantized weights
tr_path = hf_hub_download(model_id, filename="transformer/diffusion_pytorch_model.safetensors")
tr_state_dict = load_file(tr_path)
transformer.load_state_dict(tr_state_dict)
transformer = transformer.to("cuda")
print("Assembling Pipeline...")
pipe = Flux2Pipeline.from_pretrained(
"black-forest-labs/FLUX.2-dev",
transformer=transformer,
text_encoder=text_encoder,
torch_dtype=torch.bfloat16
)
pipe.enable_model_cpu_offload()
print("Ready for Inference!")
prompt = "A photo of a sneaky koala hiding behind book stacks at a library, calm snowy landscape visible through large window in the backdrop..."
image = pipe(prompt, guidance_scale=4, num_inference_steps=40).images[0]
image.save("KoalaTesting.png")
```
If the above doesn't work, try the inference method at the [HQQ Git Repo](https://github.com/dropbox/hqq)... <br>
If neither works, please leave comment. I will do more testing soon and revise, if need be. <br>
Crucially: HQQ should work with PEFT/LoRA inference + training. <br>
# MORE INFO:
[HQQ doc](https://huggingface.co/docs/transformers/en/quantization/hqq) at HugingFace. <br>
[HQQ git repo](https://github.com/dropbox/hqq?tab=readme-ov-file) with further info and code. <br>
[Blog post about HQQ](https://dropbox.tech/machine-learning/halfquadratic-quantization-of-large-machine-learning-models) originally published by the Mobius team (reposted under Dropbox.tech)<br>
|