TwinFlow: Realizing One-step Generation on Large Models with Self-adversarial Flows
Abstract
TwinFlow is a 1-step generative model framework that enhances inference efficiency without requiring fixed pretrained teacher models or standard adversarial networks, achieving high performance on text-to-image tasks and scaling efficiently.
Recent advances in large multi-modal generative models have demonstrated impressive capabilities in multi-modal generation, including image and video generation. These models are typically built upon multi-step frameworks like diffusion and flow matching, which inherently limits their inference efficiency (requiring 40-100 Number of Function Evaluations (NFEs)). While various few-step methods aim to accelerate the inference, existing solutions have clear limitations. Prominent distillation-based methods, such as progressive and consistency distillation, either require an iterative distillation procedure or show significant degradation at very few steps (< 4-NFE). Meanwhile, integrating adversarial training into distillation (e.g., DMD/DMD2 and SANA-Sprint) to enhance performance introduces training instability, added complexity, and high GPU memory overhead due to the auxiliary trained models. To this end, we propose TwinFlow, a simple yet effective framework for training 1-step generative models that bypasses the need of fixed pretrained teacher models and avoids standard adversarial networks during training, making it ideal for building large-scale, efficient models. On text-to-image tasks, our method achieves a GenEval score of 0.83 in 1-NFE, outperforming strong baselines like SANA-Sprint (a GAN loss-based framework) and RCGM (a consistency-based framework). Notably, we demonstrate the scalability of TwinFlow by full-parameter training on Qwen-Image-20B and transform it into an efficient few-step generator. With just 1-NFE, our approach matches the performance of the original 100-NFE model on both the GenEval and DPG-Bench benchmarks, reducing computational cost by 100times with minor quality degradation. Project page is available at https://zhenglin-cheng.com/twinflow.
Community
Taming 20B full-parameter few-step training with self-adversarial flows! ๐๐ป
- One-model Simplicity: We eliminate the need for auxiliary networks (discriminators, teachers, fake score estimators...), everything in one model!
- Scalability on Large Models: We transform Qwen-Image-20B into high-quality few-step generators by full-parameter training (Optimized for human figure generation!).
Checkout our 2-NFE images generated by our TwinFlow-Qwen-Image! ๐
We are also working on Z-Image-Turbo, stay tuned!
very nice paper! ๐๐
Hope it there will be one for **OnomaAIResearch/Illustrious-xl-early-release-v0 ** gonna save us from 24/29 sampling steps for every GEN ๐
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Towards One-step Causal Video Generation via Adversarial Self-Distillation (2025)
- Phased DMD: Few-step Distribution Matching Distillation via Score Matching within Subintervals (2025)
- Flash-DMD: Towards High-Fidelity Few-Step Image Generation with Efficient Distillation and Joint Reinforcement Learning (2025)
- Uni-DAD: Unified Distillation and Adaptation of Diffusion Models for Few-step Few-shot Image Generation (2025)
- GAS: Improving Discretization of Diffusion ODEs via Generalized Adversarial Solver (2025)
- There is No VAE: End-to-End Pixel-Space Generative Modeling via Self-Supervised Pre-training (2025)
- TReFT: Taming Rectified Flow Models For One-Step Image Translation (2025)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 1
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper
