Diffusion models
Setup
- : true sampling distribution of observed data
- : learned sampling distribution
- : noisy latent variables at each step
Forward process
In the forward process, noise is continuously added to the data across multiple steps, converging toward white noise : The fixed Gaussian forward process means we can directly calculate/sample from without calculating all the intermediate steps (similar to Autoregressive models): Where , , and is white noise. Again, this is 100% analogous to the forecasting equation for an AR(1).
In practice, is chosen to be close to 1, as this yields the best results.
Denoising
In the denoising or reverse diffusion process, noise is progressively removed over multiple steps, modeling the inverse process: Where .
is parameterized using a denoising network, (of which many exist, including the popular U-Net): This model is trained using the objective function:
Intuition
One intuitive way of thinking about diffusion models that I came up with while watching Advancing Diffusion Models for Text Generation – think about diffusion modeling as picking up a handful of sand (sampled random noise) and throwing the grains of sand against a canvas (manifold of the data distribution you care about, e.g. images). If you throw the sand in a particularly skillful way, the patterns that end up on the canvas will look like recognizable images. The training process in diffusion models is essentially learning this motion of skillfully throwing a bunch of sand so as to end up with an interesting pattern on the canvas. This is obviously somewhat complicated and a bit of miracle that this even works / is possible!
References
@kolloviehPredictRefineSynthesize2023
@harveyFlexibleDiffusionModeling2022