CODE: Confident Ordinary Differential Editing

École Polytechnique Fédérale de Lausanne (EPFL)

CODE is a novel approach for image editing that handles noisy or out-of-distribution image guidance effectively. Utilizing a diffusion model generative prior, CODE enhances images through score-based updates along the probability-flow ODE trajectory, without needing task-specific training or handcrafted modules, and operating in a fully blind manner. A confidence interval-based clipping technique further enhances blind restoration. Experimental results demonstrate CODE's effectiveness, particularly with severely degraded or out-of-distribution inputs.

main figure

Abstract

Conditioning image generation on an input image facilitates seamless editing and the creation of photorealistic images. However, conditioning on noisy or Out-of-Distribution (OoD) images poses significant challenges, particularly in balancing fidelity to the input and realism of the output. We introduce Confident Ordinary Differential Editing (CODE), a novel approach for image synthesis that effectively handles OoD guidance images. Utilizing a diffusion model generative prior, CODE enhances images through score-based updates along the probability-flow Ordinary Differential Equation (ODE) trajectory. This method requires no task-specific training, handcrafted modules, or assumptions, and is compatible with any diffusion model. Positioned at the intersection of conditional image generation and blind image restoration, CODE operates in a fully blind manner, relying solely on a pre-trained generative model. Our method introduces an alternative approach to blind restoration: instead of targeting a specific ground truth image based on assumptions about the underlying corruption, CODE aims to increase the likelihood of the input image while maintaining fidelity. This results in the most probable in-distribution image around the input. Our contributions are twofold. First, CODE introduces a novel editing method based on ODE providing enhanced control, realism, and fidelity compared to SDE-based counterpart. Second, we introduce a confidence interval-based clipping method, which improves CODE's effectiveness by allowing it to disregard certain pixels or information, thus enhancing the restoration process in a blind manner. Experimental results demonstrate CODE's effectiveness over existing methods, particularly in scenarios involving severe degradation or OoD inputs.

Method Figure: Editing corrupted images with CODE. The blue dots illustrates the editing process. The green contour plot represents the distribution of images. Given a corrupted image, we encode it into a latent space using the probability-flow ODE. We use Langevin Dynamics in the latent space to correct the encoded image. Finally, we project the updated latent back into the visual domain.

Ordinary Differential Editing

Our approach, illustrated in Figure 2, provides a theoretically grounded method for mapping Out-of-Distribution (OoD) samples to in-distribution ones. The process involves the following steps:

  1. Inversion of the Probability-Flow Ordinary Differential Equation (ODE): Inspired by SDEdit, we invert the diffusion process by avoiding the injection of extra noise, thus maintaining the fidelity of the reconstructed image. This inversion process ensures precise image reconstruction, limited only by approximation errors.
  2. Langevin Dynamics in Latent Spaces: Utilizing Langevin dynamics within the latent spaces allows for gradient updates that increase the likelihood of the latent representation. This method can be tailored to prioritize either realism or fidelity by selecting the step size in the Langevin dynamics and the latent space for optimization.

The primary motivation for inverting the degraded image is the model's ability to process OoD images. Direct estimation of the score on degraded images is impractical due to poor performance on OoD data. By mapping the corrupted input back to the latent space, we obtain more accurate estimates within a distribution closely resembling a multivariate Gaussian.

This method, detailed in Algorithm 1 in the paper, decouples noise injection levels, correction levels, and latent spaces, enhancing control over the editing process. Our experimental results show CODE’s superiority over SDEdit in realism and fidelity, especially in challenging scenarios.

Confidence-Based Clipping

We introduce a confidence-based clipping method for the latent codes that does not depend on the prediction nor the original sample. This method leverages the cumulative distribution function of a standard normal distribution to define confidence intervals for clipping latent codes during the encoding process.

Proposition 1: Let Φ be the cumulative distribution function of N(0, I) and let \( x_0 \in [-1, 1] \). For \( \alpha_t \in [0, 1] \), ∀ \( t \in [0, 1] \), assume that \( x_t \sim N(\sqrt{\alpha_t} \cdot \alpha_0, \sqrt{1 - \alpha_t} \cdot I) \). Then, for all \( \eta \):

\( P(x_t \in [ -\sqrt{\alpha_t} - \eta \cdot \sqrt{1 - \alpha_t}, \sqrt{\alpha_t} + \eta \cdot \sqrt{1 - \alpha_t} ]) \geq \Phi(\eta) - \Phi(-η) \).

Specifically, for \( \eta = 2 \):

\( P(x_t \in [ -\sqrt{\alpha_t} - 2 \cdot \sqrt{1 - \alpha_t}, \sqrt{\alpha_t} + 2 \cdot \sqrt{1 - \alpha_t} ]) \geq 0.95 \).

During the encoding process, we propose to clip the latent codes using a confidence interval derived from Proposition 1:

\( x^{clipped}_{t} = Clip(x_t, min = -\sqrt{\alpha_t} - \eta \cdot \sqrt{1 - \alpha_t}, max = \sqrt{\alpha_t} + \eta \cdot \sqrt{1 - \alpha_t}) \),

where \( t \) is the timestep, \( \alpha_t \) is the predefined schedule of the diffusion model, and \( \eta \) is the chosen confidence parameter.

Similar to our editing method, CBC is agnostic to the input and suitable for blind restoration scenarios. Combining CBC with our ODE editing method forms the complete CODE method. The two methods synergize efficiently, enhancing the restoration process in a blind manner.

Performance

We conducted extensive experiments to evaluate the performance of CODE across various corruption types. The experiments were carried out on multiple datasets, including CelebA-HQ, LSUN-Bedroom, and LSUN-Church, all at 256x256 resolution. Our setup utilized open-source pre-trained DDPM models and followed the DDIM inversion method with 200 steps.

Experimental Setup: We tested our approach on 47 corruption types, including noise, blur, weather, and digital artifacts. The corruption datasets are publicly available and were used to benchmark the performance of CODE against existing methods.

The results, also available below, demonstrate CODE's effectiveness over existing methods. CODE significantly improves performance on complex corruptions like Fog and Contrast, where other baselines show lower versatility and require extra training.

Figure Performance: Comparison between CODE and SDEdit in terms of fidelity kept per realism gained. CODE allows the creation of more realisting images mantaining a higher degree of fidelity.

Metrics Table: Ablation of the components of CODE. CBC increases stability and helps DDIM reconstruction to discard unwanted pixels, while ODE-based correction with Langevin sampling increase the likelihood. The two components combined form CODE with outperforms any of the components alone and SDEdit.

Our evaluation shows that CODE consistently produces high-quality restorations across diverse and severe corruptions, achieving superior realism and fidelity compared to SDEdit and other baselines.

Examples on 47 Corruptions

BibTeX


@misc{vandelft2024codeconfidentordinarydifferential,
      title={CODE: Confident Ordinary Differential Editing}, 
      author={Bastien van Delft and Tommaso Martorella and Alexandre Alahi},
      year={2024},
      eprint={2408.12418},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2408.12418}, 
}