MPQ-DMv2: Flexible Residual Mixed Precision Quantization for Low-Bit Diffusion Models with Temporal Distillation
Abstract
Diffusion models have demonstrated remarkable performance on vision generation tasks. However, the high computational complexity hinders its wide application on edge devices. Quantization has emerged as a promising technique for inference acceleration and memory reduction. However, existing quantization methods do not generalize well under extremely low-bit (2-4 bit) quantization. Directly applying these methods will cause severe performance degradation. We identify that the existing quantization framework suffers from the outlier-unfriendly quantizer design, suboptimal initialization, and optimization strategy. We present MPQ-DMv2, an improved Mixed Precision Quantization framework for extremely low-bit Diffusion Models. For the quantization perspective, the imbalanced distribution caused by salient outliers is quantization-unfriendly for uniform quantizer. We propose Flexible Z-Order Residual Mixed Quantization that utilizes an efficient binary residual branch for flexible quant steps to handle salient error. For the optimization framework, we theoretically analyzed the convergence and optimality of the LoRA module and propose Object-Oriented Low-Rank Initialization to use prior quantization error for informative initialization. We then propose Memory-based Temporal Relation Distillation to construct an online time-aware pixel queue for long-term denoising temporal information distillation, which ensures the overall temporal consistency between quantized and full-precision model. Comprehensive experiments on various generation tasks show that our MPQ-DMv2 surpasses current SOTA methods by a great margin on different architectures, especially under extremely low-bit widths.
Turn this paper into a full lesson
ArcXiv compiles a staged curriculum from this paper: 8-12 lessons across beginner → advanced, synthesised section guides, visuals, flashcards, a quiz, exercises, and on-demand deep dives per section. Grounded in the abstract, never invented.