Skip to content

ACL: Activating Capability of Linear Attention for Image Restoration

CVPR2025

ACL: 激活线性注意力的图像修复

Abstract

Image restoration (IR), a cornerstone of computer vision, has embarked on a new epoch with the advent of deep learning technologies. Recently, numerous CNN and Transformer-based methods have been developed, yet they frequently encounter limitations in global receptive fields and computational efficiency. To mitigate these challenges, recent studies have employed the Selective Space State Model (Mamba), which embodies both attributes. However, due to Mamba's inherent one-dimensional scanning limitations, some approaches have introduced multi-directional scanning to bolster inter-sequence correlations. Despite these enhancements, these methods still struggle with managing local pixel correlations across various directions. Moreover, the recursive computation in Mamba's SSM leads to reduced efficiency. To resolve these issues, we exploit the mathematical congruences between linear attention and SSM within the Mamba to propose a novel model based on a new design structure, ACL. This model integrates linear attention blocks instead of SSM within the Mamba, serving as the core component of encoders/decoders, and aims to preserve a global perspective while boosting computational efficiency. Furthermore, we have designed a simple yet robust local enhancement module with multi-scale dilated convolutions to extract coarse and fine features to improve local detail recovery. Experimental results confirm that our ACL model excels in classical IR tasks such as de-blurring and de-raining, while maintaining relatively low parameter counts and FLOPs.

图像修复(IR)作为计算机视觉的基石,随着深度学习技术的兴起开启了新纪元。近年来,尽管涌现了大量基于 CNN 和 Transformer 的方法,但这些方法仍常受限于全局感受野与计算效率。为应对这些挑战,近期研究采用了兼具两种特性的选择性状态空间模型(Mamba)。然而,由于 Mamba 固有的单向扫描限制,部分方法通过引入多方向扫描来增强序列间关联性。尽管有所改进,这些方法在处理不同方向的局部像素关联性时仍显不足。此外,Mamba 中 SSM 的递归计算机制导致效率降低。为解决这些问题,我们利用 Mamba 中线性注意力与 SSM 的数学一致性,提出基于新型架构 ACL 的模型。该模型以线性注意力模块替代 Mamba 中的 SSM 作为编码器/解码器的核心组件,在保持全局视野的同时提升计算效率。此外,我们设计了包含多尺度空洞卷积的轻量高效局部增强模块,通过提取粗粒度与细粒度特征来改善局部细节恢复。实验结果表明,我们的 ACL 模型在去模糊、去雨等经典图像修复任务中表现优异,同时保持较低的参数量与 FLOPs。


Complexity Experts are Task-Discriminative Learners for Any Image Restoration

https://github.com/eduardzamfir/MoCE-IR

CVPR2025

Abstract

Recent advancements in all-in-one image restoration models have revolutionized the ability to address diverse degradations through a unified framework. However, parameters tied to specific tasks often remain inactive for other tasks, making mixture-of-experts (MoE) architectures a natural extension. Despite this, MoEs often show inconsistent behavior, with some experts unexpectedly generalizing across tasks while others struggle within their intended scope. This hinders leveraging MoEs' computational benefits by bypassing irrelevant experts during inference. We attribute this undesired behavior to the uniform and rigid architecture of traditional MoEs. To address this, we introduce ``complexity experts" -- flexible expert blocks with varying computational complexity and receptive fields. A key challenge is assigning tasks to each expert, as degradation complexity is unknown in advance. Thus, we execute tasks with a simple bias toward lower complexity. To our surprise, this preference effectively drives task-specific allocation, assigning tasks to experts with the appropriate complexity. Extensive experiments validate our approach, demonstrating the ability to bypass irrelevant experts during inference while maintaining superior performance. The proposed MoCE-IR model outperforms state-of-the-art methods, affirming its efficiency and practical applicability.


GenDeg: Diffusion-Based Degradation Synthesis for Generalizable All-in-One Image Restoration

https://github.com/sudraj2002/GenDeg

CVPR2025

Abstract

Deep learning-based models for All-In-One Image Restoration (AIOR) have achieved significant advancements in recent years. However, their practical applicability is limited by poor generalization to samples outside the training distribution. This limitation arises primarily from insufficient diversity in degradation variations and scenes within existing datasets, resulting in inadequate representations of real-world scenarios. Additionally, capturing large-scale real-world paired data for degradations such as haze, low-light, and raindrops is often cumbersome and sometimes infeasible. In this paper, we leverage the generative capabilities of latent diffusion models to synthesize high-quality degraded images from their clean counterparts. Specifically, we introduce GenDeg, a degradation and intensity-aware conditional diffusion model capable of producing diverse degradation patterns on clean images. Using GenDeg, we synthesize over 550k samples across six degradation types: haze, rain, snow, motion blur, low-light, and raindrops. These generated samples are integrated with existing datasets to form the GenDS dataset, comprising over 750k samples. Our experiments reveal that image restoration models trained on the GenDS dataset exhibit significant improvements in out-of-distribution performance compared to those trained solely on existing datasets. Furthermore, we provide comprehensive analyses on the implications of diffusion model-based synthetic degradations for AIOR. The code will be made publicly available.


MaIR: A Locality- and Continuity-Preserving Mamba for Image Restoration

https://github.com/XLearning-SCU/2025-CVPR-MaIR

CVPR2025

Abstract

Recent advancements in Mamba have shown promising results in image restoration. These methods typically flatten 2D images into multiple distinct 1D sequences along rows and columns, process each sequence independently using selective scan operation, and recombine them to form the outputs. However, such a paradigm overlooks two vital aspects: i) the local relationships and spatial continuity inherent in natural images, and ii) the discrepancies among sequences unfolded through totally different ways. To overcome the drawbacks, we explore two problems in Mamba-based restoration methods: i) how to design a scanning strategy preserving both locality and continuity while facilitating restoration, and ii) how to aggregate the distinct sequences unfolded in totally different ways. To address these problems, we propose a novel Mamba-based Image Restoration model (MaIR), which consists of Nested S-shaped Scanning strategy (NSS) and Sequence Shuffle Attention block (SSA). Specifically, NSS preserves locality and continuity of the input images through the stripe-based scanning region and the S-shaped scanning path, respectively. SSA aggregates sequences through calculating attention weights within the corresponding channels of different sequences. Thanks to NSS and SSA, MaIR surpasses 40 baselines across 14 challenging datasets, achieving state-of-the-art performance on the tasks of image super-resolution, denoising, deblurring and dehazing. Our codes will be available after acceptance.


MambaIRv2: Attentive State Space Restoration

https://github.com/csguoh/MambaIR

CVPR2025

Abstract

The Mamba-based image restoration backbones have recently demonstrated significant potential in balancing global reception and computational efficiency. However, the inherent causal modeling limitation of Mamba, where each token depends solely on its predecessors in the scanned sequence, restricts the full utilization of pixels across the image and thus presents new challenges in image restoration. In this work, we propose MambaIRv2, which equips Mamba with the non-causal modeling ability similar to ViTs to reach the attentive state space restoration model. Specifically, the proposed attentive state-space equation allows to attend beyond the scanned sequence and facilitate image unfolding with just one single scan. Moreover, we further introduce a semantic-guided neighboring mechanism to encourage interaction between distant but similar pixels. Extensive experiments show our MambaIRv2 outperforms SRFormer by even 0.35dB PSNR for lightweight SR even with 9.3% less parameters and suppresses HAT on classic SR by up to 0.29dB.


OSDFace: One-Step Diffusion Model for Face Restoration

https://github.com/jkwang28/OSDFace

CVPR2025

Abstract

Diffusion models have demonstrated impressive performance in face restoration. Yet, their multi-step inference process remains computationally intensive, limiting their applicability in real-world scenarios. Moreover, existing methods often struggle to generate face images that are harmonious, realistic, and consistent with the subject's identity. In this work, we propose OSDFace, a novel one-step diffusion model for face restoration. Specifically, we propose a visual representation embedder (VRE) to better capture prior information and understand the input face. In VRE, low-quality faces are processed by a visual tokenizer and subsequently embedded with a vector-quantized dictionary to generate visual prompts. Additionally, we incorporate a facial identity loss derived from face recognition to further ensure identity consistency. We further employ a generative adversarial network (GAN) as a guidance model to encourage distribution alignment between the restored face and the ground truth. Experimental results demonstrate that OSDFace surpasses current state-of-the-art (SOTA) methods in both visual quality and quantitative metrics, generating high-fidelity, natural face images with high identity consistency.


Reconciling Stochastic and Deterministic Strategies for Zero-shot Image Restoration using Diffusion Model in Dual

CVPR2025

Abstract

Plug-and-play (PnP) methods offer an iterative strategy for solving image restoration (IR) problems in a zero-shot manner, using a learned \textit{discriminative denoiser} as the implicit prior. More recently, a sampling-based variant of this approach, which utilizes a pre-trained \textit{generative diffusion model}, has gained great popularity for solving IR problems through stochastic sampling. The IR results using PnP with a pre-trained diffusion model demonstrate distinct advantages compared to those using discriminative denoisers, \ie improved perceptual quality while sacrificing the data fidelity. The unsatisfactory results are due to the lack of integration of these strategies in the IR tasks. In this work, we propose a novel zero-shot IR scheme, dubbed Reconciling Diffusion Model in Dual (RDMD), which leverages only a \textbf{single} pre-trained diffusion model to construct \textbf{two} complementary regularizers. Specifically, the diffusion model in RDMD will iteratively perform deterministic denoising and stochastic sampling, aiming to achieve high-fidelity image restoration with appealing perceptual quality. RDMD also allows users to customize the distortion-perception tradeoff with a single hyperparameter, enhancing the adaptability of the restoration process in different practical scenarios. Extensive experiments on several IR tasks demonstrate that our proposed method could achieve superior results compared to existing approaches on both the FFHQ and ImageNet datasets.


Zero-Shot Image Restoration Using Few-Step Guidance of Consistency Models (and Beyond)

CVPR2025

Abstract

In recent years, it has become popular to tackle image restoration tasks with a single pretrained diffusion model (DM) and data-fidelity guidance, instead of training a dedicated deep neural network per task. However, such "zero-shot" restoration schemes currently require many Neural Function Evaluations (NFEs) for performing well, which may be attributed to the many NFEs needed in the original generative functionality of the DMs. Recently, faster variants of DMs have been explored for image generation. These include Consistency Models (CMs), which can generate samples via a couple of NFEs. However, existing works that use guided CMs for restoration still require tens of NFEs or fine-tuning of the model per task that leads to performance drop if the assumptions during the fine-tuning are not accurate. In this paper, we propose a zero-shot restoration scheme that uses CMs and operates well with as little as 4 NFEs. It is based on a wise combination of several ingredients: better initialization, back-projection guidance, and above all a novel noise injection mechanism. We demonstrate the advantages of our approach for image super-resolution, deblurring and inpainting. Interestingly, we show that the usefulness of our noise injection technique goes beyond CMs: it can also mitigate the performance degradation of existing guided DM methods when reducing their NFE count.