Papers
arxiv:2604.12322

Self-Adversarial One Step Generation via Condition Shifting

Published on Apr 14
ยท Submitted by
Deyuan Liu
on Apr 15
Authors:
,
,
,
,
,

Abstract

APEx enables efficient one-step text-to-image synthesis by eliminating adversarial training through endogenous gradient estimation from flow models, achieving superior quality and speed compared to existing methods.

AI-generated summary

The push for efficient text to image synthesis has moved the field toward one step sampling, yet existing methods still face a three way tradeoff among fidelity, inference speed, and training efficiency. Approaches that rely on external discriminators can sharpen one step performance, but they often introduce training instability, high GPU memory overhead, and slow convergence, which complicates scaling and parameter efficient tuning. In contrast, regression based distillation and consistency objectives are easier to optimize, but they typically lose fine details when constrained to a single step. We present APEX, built on a key theoretical insight: adversarial correction signals can be extracted endogenously from a flow model through condition shifting. Using a transformation creates a shifted condition branch whose velocity field serves as an independent estimator of the model's current generation distribution, yielding a gradient that is provably GAN aligned, replacing the sample dependent discriminator terms that cause gradient vanishing. This discriminator free design is architecture preserving, making APEX a plug and play framework compatible with both full parameter and LoRA based tuning. Empirically, our 0.6B model surpasses FLUX-Schnell 12B (20times more parameters) in one step quality. With LoRA tuning on Qwen-Image 20B, APEX reaches a GenEval score of 0.89 at NFE=1 in 6 hours, surpassing the original 50-step teacher (0.87) and providing a 15.33times inference speedup. Code is available https://github.com/LINs-lab/APEX.

Community

Unlocking GAN-level 1-step generation without the GAN! ๐Ÿš€ Meet APEX: Self-Adversarial Training via Condition Shifting.

  1. Discriminator-Free Realism: We completely eliminate the need for unstable external discriminators! APEX generates its own adversarial correction signal endogenously using an affine condition shift. You get GAN-aligned gradients with plug-and-play training.
  2. Extreme Scalability: APEX can easily full-parameter SFT and LoRA-tuned the massive Qwen-Image-20B in just 6 hours, achieving a GenEval of 0.89 at NFE=1.

Checkout these stunning 2-NFE generated by our APEX-Qwen-Image-20B-2512! ๐Ÿ‘‡

image

Code is already available on our GitHub. We are also exploring APEX for Video, stay tuned!

Sign up or log in to comment

Get this paper in your agent:

hf papers read 2604.12322
Don't have the latest CLI?
curl -LsSf https://hf.co/cli/install.sh | bash

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2604.12322 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2604.12322 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2604.12322 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.