# Diffusion models

## Setup

- $q(y)$: true sampling distribution of observed data
- $p_{θ}(y)$: learned sampling distribution
- $x_{1:T}$: noisy latent variables at each step $t$

## Forward process

In the forward process, noise is continuously added to the data across multiple steps, converging toward white noise $x_{T}∼N(0,1)$: $q(x_{t}∣x_{t−1}):=N(1−β_{t} x_{t−1},β_{t}I)$ The fixed Gaussian forward process means we can directly calculate/sample from $q(x_{t}∣y)$ without calculating all the intermediate steps (similar to Autoregressive models): $x_{t}=αˉ_{t} y+(1−αˉ_{t}) ϵ$ Where $α_{t}=1−β_{t}$, $αˉ_{t}=∏_{i=1}α_{i}$, and $ϵ$ is white noise. Again, this is 100% analogous to the forecasting equation for an AR(1).

In practice, $α_{t}$ is chosen to be close to 1, as this yields the best results.

## Denoising

In the denoising or reverse diffusion process, noise is progressively removed over multiple steps, modeling the inverse process: $p_{θ}(x_{t−1}∣x_{t}):=N(μ_{θ}(x_{t},t),σ_{t}I)$ Where $σ_{t}=1−αˉ_{t}1−αˉ_{t−1} β_{t}$.

$μ_{θ}$ is parameterized using a denoising network, $ϵ_{θ}$ (of which many exist, including the popular U-Net): $μ_{θ}(x_{t},t)=α_{t} 1 (x_{t}−1−αˉ_{t} β_{t} ϵ_{θ}(x_{t},t))$ This model is trained using the objective function: $E_{y,ϵ,t}[∣∣ϵ_{θ}(x_{t},t)−ϵ∣∣_{2}]$

## References

@kolloviehPredictRefineSynthesize2023