Diffusion model

Script error: No such module "Draft topics". Script error: No such module "AfC topic".

Diffusion models^[1] are a way of generating realistic images using artificial intelligence. It works by applying Bayesian inference to reverse the process of adding random Gaussian noise to an image. There's a long Markov chain of typically 1000 steps of adding a little random noise at a time to a starting image to gradually degrade it. The reverse step-by-step process is called denoising. We start off with an image that's totally pure noise. We then gradually denoise it until we get a final image that looks realistic.

The neural network is trained directly on images with random noise added, and it's this trained network which is used for denoising.

DALL-E^[2] and Imagen^[3] are some examples of diffusion models.

Technical details[edit]

Let x represent the image and y represent the text caption. Let t represent the fraction of random noise added to the image with $x_{t}$ being the image with a t fraction of noise added. The variance of the noise is proportional to t. One nice property of Gaussian noise is if you add a noise of variance t, and then add another independent noise of variance $\Delta t$ , this is equivalent to adding a single Gaussian noise of variance $t+\Delta t$ . The score function is defined as $\nabla _{x_{t}}\log p(x_{t}|y)$ . This function is used as a parameter in denoising according to Bayes' theorem. A small $\Delta t$ step of denoising is approximately the same as subtracting a bit of Gaussian noise. A differentiable neural network is trained to predict the score function given the inputs $x_{t}$ , t and y. Using Bayes' theorem $p(x_{t}|y)={\frac {p(y|x_{t})p(x_{t})}{p(y)}}$ , we find the score function is $\nabla _{x_{t}}\log p(x_{t})+\nabla _{x_{t}}\log p(y|x_{t})$ . p(y|x) is given by the CLIP neural network. The first term is the unconditioned term which is caption independent. We can modify the score function to $\nabla _{x_{t}}\log p(x_{t})+\gamma \nabla _{x_{t}}\log p(y|x_{t})$ where $\gamma >1$ is the guidance parameter.

References[edit]

↑ Ho, Jonathan; Jain, Ajay; Abbeel, Pieter (2020-12-16). "Denoising Diffusion Probabilistic Models". arXiv:2006.11239 [cs.LG].
↑ "DALL·E 2". OpenAI. Retrieved 2022-05-25.
↑ "Imagen: Text-to-Image Diffusion Models". imagen.research.google. Retrieved 2022-05-25.

This article "Diffusion model" is from Wikipedia. The list of its authors can be seen in its historical and/or the page Edithistory:Diffusion model. Articles copied from Draft Namespace on Wikipedia could be seen on the Draft Namespace of Wikipedia and not main one.

This page exists already on Wikipedia.

[1] Ho, Jonathan; Jain, Ajay; Abbeel, Pieter (2020-12-16). "Denoising Diffusion Probabilistic Models". arXiv:2006.11239 [cs.LG].

[2] "DALL·E 2". OpenAI. Retrieved 2022-05-25.

[3] "Imagen: Text-to-Image Diffusion Models". imagen.research.google. Retrieved 2022-05-25.

[1]

[2]

[3]