/diffusion-literature-for-robotics

Summary of key papers and blogs about diffusion models to learn about the topic. Detailed list of all published diffusion robotics papers.

MIT LicenseMIT

Diffusion-Literature-for-Robotics

"Creating noise from data is easy; creating data from noise is generative modeling."

Yang Song in "Score-Based Generative Modeling through Stochastic Differential Equations" Song et al., 2020

This repository offers a brief summary of essential papers and blogs on diffusion models, alongside a categorized collection of robotics diffusion papers and useful code repositories for starting your own diffusion robotics project.


Table of Contents

  1. Learning about Diffusion models

  2. Diffusion in Robotics

    2.1 Imitation Learning and Policy Learning

    2.2 Video Diffusion in Robotics

    2.3 Online RL

    2.4 Offline RL

    2.5 Inverse RL

    2.6 World Models

    2.7 Task and Motion Planning

    2.8 Tactile Sensing & Pose Estimation

    2.9 Robot Design and Development

  3. Code Implementations

  4. Diffusion History


Learning about Diffusion models

While there exist many tutorials for Diffusion models, below you can find an overview of some of the best introduction blog posts and video:

If you don't like reading blog posts and prefer the original papers, below you can find a list with the most important diffusion theory papers:

A general list with all published diffusion papers can be found here: Whats the score?


Diffusion in Robotics

Since the modern diffusion models have been around for only 3 years, the literature about diffusion models in the context of robotics is still small, but growing rapidly. Below you can find most robotics diffusion papers, which have been published at conferences or uploaded to Arxiv so far:


Imitation Learning and Policy Learning


Video Diffusion in Robotics

The ability of Diffusion models to generate realistic videos over a long horizon has enabled new applications in the context of robotics.


Online RL

The standard policy gradient objective requires the gradient of the log-likelihood, which is only implicitly defined by the underlying Ordinary Differential Equation (ODE) of the diffusion model.


Offline RL


Inverse RL


World Models


Task and Motion Planning


Tactile Sensing & Pose Estimation


Robot Development and Construction

Excited to see more diffusion papers in this area in the future! Using generative models to design robots is a very interesting idea, since it allows to generate new robot designs and test them in simulation before building them in the real world.


Code Implementations

There exist numerous implementations of all diffusion models on github. Below you can find a curated list of some clean code variants of the most important diffusion models in general and for robotics:

  • Diffusers: the main diffusion project from HuggingFaces with numerous pre-trained diffusion models ready to use

  • k-diffusion: while its not the official code-base of the EDM diffusion models from Karras et al., 2022, it has very clean code and numerous samplers. Parts of the code have been used in various other projects such as Consistency Models from OpenAI and diffusers from HuggingFaces.

  • denoising-diffusion-pytorch: a clean DDPM diffusion model implementation in Pytorch to get a good understanding of all the components

  • Diffuser: Variants of this code are used in numerous trajectory diffusion OfflineRL papers listed above

  • diffusion_policy: Beautiful Code implementation of Diffusion policies from Chi et al., 2023 for Imitation Learning with 9 different simulations to test the models on

  • octo-models: The first open source foundation behavior diffusion agent, pretrained on 800k trajectories of different embodiements. The JAX code allows you to download their weights and finetune your own Octo-model on your local dataset.

  • 3d_diffuser_actor: Clean code to get started with 3D-based diffusion policies on the popular RL-bench and CALVIN benchmarks.

  • flow-diffusion: If you want to start training your own video-diffusion model, this is the right repository to start! Clean code implementations and available pre-training weights for real world dataset and two simulations.

  • dpm-solver: One of the most widely used ODE samplers for Diffusion models from Lu et al. 2022 with implementations for all different diffusion models including wrappers for discrete DDPM variants


Diffusion History

Diffusion models are a type of generative model inspired by non-equilibrium thermodynamics, introduced by Sohl-Dickstein et al., (2015). The model learns to invert a diffusion process, that gradually adds noise to a data sample. This process is a Markov chain consisting of diffusion steps, which add random Gaussian noise to a data sample. The diffusion model is used to learn to invert this process. While the paper was presented in 2015, it took several years for the diffusion models to get widespread attention in the research community. Diffusion models are a type of generative model and in this field, the main focus are vision based applications, thus all theory papers mentioned in the text below are mostly focused on image synthesis or similar tasks related to it.

There are two perspectives to view diffusion models. The first one is based on the initial idea of Sohl-Dickstein et al., (2015), while the other is based on a different direction of research known as score-based generative models. In 2019 Song & Ermon, (2019) proposed the noise-conditioned score network (NCSN), which is a predecessor to the score-based diffusion model. The main idea was to learn the score function of the unknown data distribution using a neural network. This approach had been around before, however their paper and the subsequent work Song & Ermon (2020) enabled scaling score-based models to high-dimension data distributions and made them competitive on image-generation tasks. The key idea in their work was to perturb the data distribution with various levels of Gaussian noise and learn a noise-conditional score model to predict the score of the perturbed data distributions.

In 2020, Ho et al., (2020) introduced denoising diffusion probabilistic models (DDPM), which served as the foundation for the success of Diffusion models. At that time, Diffusion models still were not competitive with state-of-the-art generate models such as GANs. However, this changed rapidly the following year when Nichol & Dhariwal (2021) improved upon the previous paper and demonstrated, that Diffusion models are competitive with GANs on image synthesis tasks. Nevertheless, it is important to note, that Diffusion models are not the jack of all trades. Diffusion models still struggle with certain image traits such as generating realistic faces or generating the right amount of fingers.

Another important idea for diffusion models in the context of image generation has been the introduction of latent diffusion models by Rombach & Blattman et al., (2022). By training the diffusion model in the latent space rather than the image space directly, they were able to improve the sampling and training speed and made it possible for everyone to run their own diffusion model on local PCs with a single GPU. Recent AI generated art is mostly based on the stable AI implementation of latent diffusion models and is open source: Github repo. Check out some cool Diffusion art on the stable-diffusion-reddit.

Conditional Diffusion models The initial diffusion models are usually trained on marginal distributions $p(x)$, but conditional image generation is also an research area of great interest. Therefore, we need conditional diffusion models to guide the generation process. Currently, there are three common methods to enable conditional generation with diffusion models:

CFG is used in many applications, since it allows to train a conditional diffusion model and unconditional diffusion model at the same time. During inference, we can combine both models and control the generation process using a guidance weight.

Diffusion models perspectives

As previously mentioned, diffusion models can be viewed from two different perspectives:

There has been a lot of effort to combine these two views into one general framework. The best generalization has been the idea of stochastic differential equations (SDEs) first presented in Song et al. (2021) and further developed to unified framework in Karras et al. (2022).

While diffusion models have mainly been applied in the area of generative modeling, recent work has shown promising applications of diffusion models in robotics. For instance, diffusion models have been used for behavior cloning and offline reinforcement learning, and have also been used to generate more diverse training data for robotics tasks.

Diffusion models offer several useful properties in the context of robotics, including:

  • Expressiveness: can learn arbitrarily complicated data-distributions
  • Training stability: they are easy to train especially in contrast GANs or EBMs
  • Multimodality: they are able to learn complicated multimodal distributions
  • Compositionality: Diffusion models can combined in a flexible way to jointly generate new samples

Overall, diffusion models have the potential to be a valuable tool for robotics.