CODE: Confident Ordinary Differential Editing

Bastien van Delft, Tommaso Martorella, Alexandre Alahi

▶ École Polytechnique Fédérale de Lausanne (EPFL)

AAAI 2025

CODE is a novel approach for image editing that handles noisy or out-of-distribution image guidance effectively. Utilizing a diffusion model generative prior, CODE enhances images through score-based updates along the probability-flow ODE trajectory, without needing task-specific training or handcrafted modules, and operating in a fully blind manner. A confidence interval-based clipping technique further enhances blind restoration. Experimental results demonstrate CODE's effectiveness, particularly with severely degraded or out-of-distribution inputs.

Abstract

Conditioning image generation on an input image facilitates seamless editing and the creation of photorealistic images. However, conditioning on noisy or Out-of-Distribution (OoD) images poses significant challenges, particularly in balancing fidelity to the input and realism of the output. We introduce Confident Ordinary Differential Editing (CODE), a novel approach for image synthesis that effectively handles OoD guidance images. Utilizing a diffusion model generative prior, CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory. This method requires no task-specific training, handcrafted modules, or assumptions, and is compatible with any diffusion model. Positioned at the intersection of conditional image generation and blind image restoration, CODE operates in a fully blind manner, relying solely on a pre-trained generative model. Our method introduces an alternative approach to blind restoration: instead of targeting a specific ground truth image based on assumptions about the underlying corruption, CODE aims to increase the likelihood of the input image while maintaining fidelity. This results in the most probable in-distribution image around the input. Our contributions are twofold. First, CODE introduces a novel editing method based on ODE providing enhanced control, realism, and fidelity compared to SDE-based counterpart. Second, we introduce a confidence interval-based clipping method, which improves CODE's effectiveness by allowing it to disregard certain pixels or information, thus enhancing the restoration process in a blind manner. Experimental results demonstrate CODE's effectiveness over existing methods, particularly in scenarios involving severe degradation or OoD inputs.

Method Figure: Editing corrupted images with CODE. The blue dots illustrates the editing process. The green contour plot represents the distribution of images. Given a corrupted image, we encode it into a latent space using the probability-flow ODE. We use Langevin Dynamics in the latent space to correct the encoded image. Finally, we project the updated latent back into the visual domain.

Ordinary Differential Editing

Our approach, illustrated in Figure 2, provides a theoretically grounded method for mapping Out-of-Distribution (OoD) samples to in-distribution ones. The process involves the following steps:

Inversion of the Probability-Flow Ordinary Differential Equation (ODE): Inspired by SDEdit, we invert the diffusion process by avoiding the injection of extra noise, thus maintaining the fidelity of the reconstructed image. This inversion process ensures precise image reconstruction, limited only by approximation errors.
Langevin Dynamics in Latent Spaces: Utilizing Langevin dynamics within the latent spaces allows for gradient updates that increase the likelihood of the latent representation. This method can be tailored to prioritize either realism or fidelity by selecting the step size in the Langevin dynamics and the latent space for optimization.

The primary motivation for inverting the degraded image is the model's ability to process OoD images. Direct estimation of the score on degraded images is impractical due to poor performance on OoD data. By mapping the corrupted input back to the latent space, we obtain more accurate estimates within a distribution closely resembling a multivariate Gaussian.

This method, detailed in Algorithm 1 in the paper, decouples noise injection levels, correction levels, and latent spaces, enhancing control over the editing process. Our experimental results show CODE’s superiority over SDEdit in realism and fidelity, especially in challenging scenarios.

Confidence-Based Clipping

We introduce a confidence-based clipping method for the latent codes that does not depend on the prediction nor the original sample. This method leverages the cumulative distribution function of a standard normal distribution to define confidence intervals for clipping latent codes during the encoding process.

Proposition 1: Let Φ be the cumulative distribution function of N(0, I) and let \( x_0 \in [-1, 1] \). For \( \alpha_t \in [0, 1] \), ∀ \( t \in [0, 1] \), assume that \( x_t \sim N(\sqrt{\alpha_t} \cdot \alpha_0, \sqrt{1 - \alpha_t} \cdot I) \). Then, for all \( \eta \):

\( P(x_t \in [ -\sqrt{\alpha_t} - \eta \cdot \sqrt{1 - \alpha_t}, \sqrt{\alpha_t} + \eta \cdot \sqrt{1 - \alpha_t} ]) \geq \Phi(\eta) - \Phi(-η) \).

Specifically, for \( \eta = 2 \):

\( P(x_t \in [ -\sqrt{\alpha_t} - 2 \cdot \sqrt{1 - \alpha_t}, \sqrt{\alpha_t} + 2 \cdot \sqrt{1 - \alpha_t} ]) \geq 0.95 \).

During the encoding process, we propose to clip the latent codes using a confidence interval derived from Proposition 1:

\( x^{clipped}_{t} = Clip(x_t, min = -\sqrt{\alpha_t} - \eta \cdot \sqrt{1 - \alpha_t}, max = \sqrt{\alpha_t} + \eta \cdot \sqrt{1 - \alpha_t}) \),

where \( t \) is the timestep, \( \alpha_t \) is the predefined schedule of the diffusion model, and \( \eta \) is the chosen confidence parameter.

Similar to our editing method, CBC is agnostic to the input and suitable for blind restoration scenarios. Combining CBC with our ODE editing method forms the complete CODE method. The two methods synergize efficiently, enhancing the restoration process in a blind manner.

Performance

We conducted extensive experiments to evaluate the performance of CODE across various corruption types. The experiments were carried out on multiple datasets, including CelebA-HQ, LSUN-Bedroom, and LSUN-Church, all at 256x256 resolution. Our setup utilized open-source pre-trained DDPM models and followed the DDIM inversion method with 200 steps.

Experimental Setup: We tested our approach on 47 corruption types, including noise, blur, weather, and digital artifacts. The corruption datasets are publicly available and were used to benchmark the performance of CODE against existing methods.

The results, also available below, demonstrate CODE's effectiveness over existing methods. CODE significantly improves performance on complex corruptions like Fog and Contrast, where other baselines show lower versatility and require extra training.

Figure Performance: Comparison between CODE and SDEdit in terms of fidelity kept per realism gained. CODE allows the creation of more realisting images mantaining a higher degree of fidelity.

Metrics Table: Ablation of the components of CODE. CBC increases stability and helps DDIM reconstruction to discard unwanted pixels, while ODE-based correction with Langevin sampling increase the likelihood. The two components combined form CODE with outperforms any of the components alone and SDEdit.

Our evaluation shows that CODE consistently produces high-quality restorations across diverse and severe corruptions, achieving superior realism and fidelity compared to SDEdit and other baselines.

Examples on 47 Corruptions

Fog

Source

Input

CODE

Checkerboard Cutout

Source

Input

CODE

Masking Random Color

Source

Input

CODE

Caustic Noise

Source

Input

CODE

Caustic Refraction

Source

Input

CODE

Brownish Noise

Source

Input

CODE

Concentric Sine Waves

Source

Input

CODE

Fish-Eye

Source

Input

CODE

Gaussian Blur

Source

Input

CODE

Pinch and Twirl

Source

Input

CODE

Pixelate

Source

Input

CODE

Scatter

Source

Input

CODE

Water Drop

Source

Input

CODE

Bleach Bypass

Source

Input

CODE

Blue Noise

Source

Input

CODE

Blue Noise Sample

Source

Input

CODE

Brightness

Source

Input

CODE

Chromatic Aberration

Source

Input

CODE

Circular Motion Blur

Source

Input

CODE

Color Dither

Source

Input

CODE

Contrast

Source

Input

CODE

Elastic Transform

Source

Input

CODE

Frost

Source

Input

CODE

Gaussian Noise

Source

Input

CODE

Glass Blur

Source

Input

CODE

Hue Shift

Source

Input

CODE

Impulse Noise

Source

Input

CODE

Inverse Sparkles

Source

Input

CODE

Jpeg Compression

Source

Input

CODE

Lines

Source

Input

CODE

Masking Vline Random Color

Source

Input

CODE

Motion Blur

Source

Input

CODE

Perlin Noise

Source

Input

CODE

Perspective No Bars

Source

Input

CODE

Plasma Noise

Source

Input

CODE

Pseudocolor

Source

Input

CODE

Quadrilateral No Bars

Source

Input

CODE

Ripple

Source

Input

CODE

Saturate

Source

Input

CODE

Shot Noise

Source

Input

CODE

Single Frequency Greyscale

Source

Input

CODE

Snow

Source

Input

CODE

Spatter

Source

Input

CODE

Speckle Noise

Source

Input

CODE

Technicolor

Source

Input

CODE

Transverse Chromatic Abberation

Source

Input

CODE

Voronoi Noise

Source

Input

CODE

BibTeX


@misc{vandelft2024codeconfidentordinarydifferential,
      title={CODE: Confident Ordinary Differential Editing}, 
      author={Bastien van Delft and Tommaso Martorella and Alexandre Alahi},
      year={2024},
      eprint={2408.12418},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.12418}, 
}