/awesome-neural-ode

A collection of resources regarding the interplay between differential equations, dynamical systems, deep learning, control and optimization.

MIT LicenseMIT

Awesome Neural ODE

A collection of resources regarding the interplay between differential equations, dynamical systems, deep learning, control, numerical methods and scientific machine learning.

NOTE: Feel free to suggest additions via Issues or Pull Requests.

The repo further introduces a (rough) categorization by assigning topic labels to each work. These are not supposed to be comprehensive or precise, and should only provide a rough idea of the contents.

IC TS DS T NM

Table of Contents

Differential Equations in Deep Learning

General Architectures

TS

Multivariate time series data in practical applications, such as health care, geoscience, and biology, are characterized by a variety of missing values. We propose a GRU-based model called GRU-D, in which a decay mechanism is designed for the input variables and the hidden states to capture the aforementioned properties. We introduce decay rates in the model to control the decay mechanism by considering the following important factors.

  • Learning unknown ODE models with Gaussian processes: arXiv18, code

DS

However, for many complex systems it is practically impossible to determine the equations or interactions governing the underlying dynamics. In these settings, parametric ODE model cannot be formulated. Here, we overcome this issue by introducing a novel paradigm of nonparametric ODE modeling that can learn the underlying dynamics of arbitrary continuous-time systems without prior knowledge. We propose to learn non-linear, unknown differential functions from state observations using Gaussian process vector fields within the exact ODE formalism.

DS TS

We present a new approach to modeling sequential data: the deep equilibrium model (DEQ). Motivated by an observation that the hidden layers of many existing deep sequence models converge towards some fixed point, we propose the DEQ approach that directly finds these equilibrium points via root-finding.

  • Fast and Deep Graph Neural Networks: AAAI20

DS

We address the efficiency issue for the construction of a deep graph neural network (GNN). The approach exploits the idea of representing each input graph as a fixed point of a dynamical system (implemented through a recurrent neural network), and leverages a deep architectural organization of the recurrent units. Efficiency is gained by many aspects, including the use of small and very sparse networks, where the weights of the recurrent units are left untrained under the stability condition introduced in this work.

DS

In this paper, we draw inspiration from Hamiltonian mechanics to train models that learn and respect exact conservation laws in an unsupervised manner.

  • Deep Lagrangian Networks: Using Physics as Model Prior for Deep Learning: ICLR19

DS

We propose Deep Lagrangian Networks (DeLaN) as a deep network structure upon which Lagrangian Mechanics have been imposed. DeLaN can learn the equations of motion of a mechanical system (i.e., system dynamics) with a deep network efficiently while ensuring physical plausibility. The resulting DeLaN network performs very well at robot tracking control.

DS

We propose Lagrangian Neural Networks (LNNs), which can parameterize arbitrary Lagrangians using neural networks. In contrast to models that learn Hamiltonians, LNNs do not require canonical coordinates, and thus perform well in situations where canonical momenta are unknown or difficult to compute.

  • Simplifying Hamiltonian and Lagrangian Neural Networks via Explicit Constraints: NeurIPS20, code

DS

Reasoning about the physical world requires models that are endowed with the right inductive biases to learn the underlying dynamics. Recent works improve generalization for predicting trajectories by learning the Hamiltonian or Lagrangian of a system rather than the differential equations directly. While these methods encode the constraints of the systems using generalized coordinates, we show that embedding the system into Cartesian coordinates and enforcing the constraints explicitly with Lagrange multipliers dramatically simplifies the learning problem.

Neural Operators

  • Neural Operator: Learning Maps Between Function Spaces: arXv21

We propose a generalization of neural networks to learn operators that maps between infinite dimensional function spaces. We formulate the approximation of operators by composition of a class of linear integral operators and nonlinear activation functions, so that the composed operator can approximate complex nonlinear operators. We prove a universal approximation theorem for our construction. Furthermore, we introduce four classes of operator parameterizations: graph-based operators, low-rank operators, multipole graph-based operators, and Fourier operators and describe efficient algorithms for computing with each one.

  • Fourier Neural Operator for Parametric Partial Differential Equations: ICLR 2021

We formulate a new neural operator by parameterizing the integral kernel directly in Fourier space, allowing for an expressive and efficient architecture.

  • FourCastNet: A Global Data-driven High-resolution Weather Model using Adaptive Fourier Neural Operators

FourCastNet, short for Fourier Forecasting Neural Network, is a global data-driven weather forecasting model that provides accurate short to medium-range global predictions at 0.25∘ resolution. FourCastNet accurately forecasts high-resolution, fast-timescale variables such as the surface wind speed, precipitation, and atmospheric water vapor.

  • Transform Once: Efficient Operator Learning in Frequency Domain

This work introduces a blueprint for frequency domain learning through a single transform: transform once (T1). To enable efficient, direct learning in the frequency domain we develop a variance preserving weight initialization scheme and address the open problem of choosing a transform. Our results noticeably streamline the design process of frequency-domain models, pruning redundant transforms, and leading to speedups of 3x to 10x that increase with data resolution and model size. We perform extensive experiments on learning to solve partial differential equations, including incompressible Navier-Stokes, turbulent flows around airfoils, and high-resolution video of smoke dynamics. T1 models improve on the test performance of SOTA FDMs while requiring significantly less computation, with over 20% reduction in predictive error across tasks.

Neural ODEs

  • Neural Ordinary Differential Equations (best paper award): NeurIPS18

T TS

We introduce a new family of deep neural network models. Instead of specifying a discrete sequence of hidden layers, we parameterize the derivative of the hidden state using a neural network. We also construct continuous normalizing flows, a generative model that can train by maximum likelihood, without partitioning or ordering the data dimensions

T DS IC TS

Continuous deep learning architectures have recently re-emerged as Neural Ordinary Differential Equations (Neural ODEs). This infinite--depth approach theoretically bridges the gap between deep learning and dynamical systems, offering a novel perspective. However, deciphering the inner working of these models is still an open challenge, as most applications apply them as generic black--box modules. In this work we "open the box", further developing the continuous-depth formulation with the aim of clarifying the influence of several design choices on the underlying dynamics.

  • Differentiable Multiple Shooting Layers: NeurIPS21

We detail a novel class of implicit neural models. Leveraging time-parallel methods for differential equations, Multiple Shooting Layers (MSLs) seek solutions of initial value problems via parallelizable root-finding algorithms. MSLs broadly serve as drop-in replacements for neural ordinary differential equations (Neural ODEs) with improved efficiency in number of function evaluations (NFEs) and wall-clock inference time.

IC

We show that Neural Ordinary Differential Equations (ODEs) learn representations that preserve the topology of the input space and prove that this implies the existence of functions Neural ODEs cannot represent. To address these limitations, we introduce Augmented Neural ODEs which, in addition to being more expressive models, are empirically more stable, generalize better and have a lower computational cost than Neural ODEs.

  • Latent ODEs for Irregularly-Sampled Time Series: NeurIPS19

TS

  • ODE2VAE: Deep generative second order ODEs with Bayesian neural networks: NeurIPS19

TS

  • Symplectic ODE-Net: Learning Hamiltonian Dynamics with Control: arXiv19

  • Stable Neural Flows: arXiv20

DS

  • On Second Order Behaviour in Augmented Neural ODEs NeurIPS20

TS

  • Neural Hybrid Automata: Learning Dynamics with Multiple Modes and Stochastic Transitions: NeurIPS21

Effective control and prediction of dynamical systems often require appropriate handling of continuous-time and discrete, event-triggered processes. Stochastic hybrid systems (SHSs), common across engineering domains, provide a formalism for dynamical systems subject to discrete, possibly stochastic, state jumps and multi-modal continuous-time flows. Despite the versatility and importance of SHSs across applications, a general procedure for the explicit learning of both discrete events and multi-mode continuous dynamics remains an open problem. This work introduces Neural Hybrid Automata (NHAs), a recipe for learning SHS dynamics without a priori knowledge on the number of modes and inter-modal transition dynamics. NHAs provide a systematic inference method based on normalizing flows, neural differential equations and self-supervision.

Training of Neural ODEs

  • Accelerating Neural ODEs with Spectral Elements: arXiv19

NM

  • Adaptive Checkpoint Adjoint Method for Gradient Estimation in Neural ODE: ICML20

NM IC

  • MALI: A memory efficient and reverse accurate integrator for Neural ODEs: ICLR21

T NM IC

Existing implementations of the adjoint method suffer from inaccuracy in reverse-time trajectory, while the naive method and the adaptive checkpoint adjoint method (ACA) have a memory cost that grows with integration time. In this project, based on the asynchronous leapfrog (ALF) solver, we propose the Memory-efficient ALF Integrator (MALI), which has a constant memory cost w.r.t number of solver steps in integration similar to the adjoint method, and guarantees accuracy in reverse-time trajectory (hence accuracy in gradient estimation).

Speeding up continuous models

  • How to Train you Neural ODE: ICML20

IC

  • Learning Differential Equations that are Easy to Solve: NeurIPS20

NM

  • Hypersolvers: Toward Fast Continuous-Depth Models: NeurIPS20

NM

  • Hey, that's not an ODE": Faster ODE Adjoints with 12 Lines of Code: arXiV20

NM

Neural differential equations may be trained by backpropagating gradients via the adjoint method. Here, we demonstrate that the particular structure of the adjoint equations makes the usual choices of norm (such as L2) unnecessarily stringent. By replacing it with a more appropriate (semi)norm, fewer steps are unnecessarily rejected and the backpropagation is made faster.

  • Interpolation Technique to Speed Up Gradients Propagation in Neural ODEs: NeurIPS20

NM IC

We propose a simple interpolation-based method for the efficient approximation of gradients in neural ODE models. We compare it with the reverse dynamic method (known in the literature as “adjoint method”) to train neural ODEs on classification, density estimation, and inference approximation tasks.

  • Opening the Blackbox: Accelerating Neural Differential Equations by Regularizing Internal Solver Heuristics: ICML21

NM

Can we force the NDE to learn the version with the least steps while not increasing the training cost? Current strategies to overcome slow prediction require high order automatic differentiation, leading to significantly higher training time. We describe a novel regularization method that uses the internal cost heuristics of adaptive differential equation solvers combined with discrete adjoint sensitivities

Control with Neural ODEs

  • Model-based Reinforcement Learning for Semi-Markov Decision Processes with Neural ODEs: NeurIPS20

In this paper, we take a model-based approach to continuous-time RL, modeling the dynamics via neural ordinary differential equations (ODEs). Not only is this more sample efficient than model-free approaches, but it allows us to efficiently adapt policies learned using one schedule of interactions with the environment for another.

  • Optimal Energy Shaping via Neural Approximators: arXiv20

We introduce optimal energy shaping as an enhancement of classical passivity-based control methods. A promising feature of passivity theory, alongside stability, has traditionally been claimed to be intuitive performance tuning along the execution of a given task. However, a systematic approach to adjust performance within a passive control framework has yet to be developed, as each method relies on few and problem-specific practical insights. Here, we cast the classic energy-shaping control design process in an optimal control framework; once a task-dependent performance metric is defined, an optimal solution is systematically obtained through an iterative procedure relying on neural networks and gradient-based optimization.

Neural GDEs

  • Graph Neural Ordinary Differential Equations (spotlight): AAAI DLGMA20

DS TS

We introduce the framework of continuous–depth graph neural networks (GNNs). Neural graph ordinary differential equations (Neural GDEs) are formalized as the counterpart to GNNs where the input–output relationship is determined by a continuum of GNN layers, blending discrete topological structures and differential equations. We further introduce general Hybrid Neural GDE models as a hybrid dynamical systems.

  • Continuous–Depth Neural Models for Dynamic Graph Prediction: arXiv21, extended version of "Graph Neural Ordinary Differential Equations"

DS TS

Additional Neural GDE variants are developed to tackle the spatio–temporal setting of dynamic graphs. The evaluation protocol for Neural GDEs spans several application domains, including traffic forecasting and prediction in biological networks.

  • GRAND: Graph Neural Diffusion: arXiv21

We present Graph Neural Diffusion (GRAND) that approaches deep learning on graphs as a continuous diffusion process and treats Graph Neural Networks (GNNs) as discretisations of an underlying PDE

Neural SDEs

  • Neural SDE: Stabilizing Neural ODE Networks with Stochastic Noise: arXiv19

  • Neural Jump Stochastic Differential Equations: arXiv19

TS

  • Towards Robust and Stable Deep Learning Algorithms for Forward Backward Stochastic Differential Equations: arXiv19

T

  • Scalable Gradients and Variational Inference for Stochastic Differential Equations: AISTATS20

  • Score-Based Generative Modeling through Stochastic Differential Equations (oral): ICLR20

IC

We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise.

  • Efficient and Accurate Gradients for Neural SDEs: NeurIPS21

we introduce the reversible Heun method. This is a new SDE solver that is algebraically reversible: eliminating numerical gradient errors, and the first such solver of which we are aware. Moreover it requires half as many function evaluations as comparable solvers, giving up to a 1.98× speedup. Second, we introduce the Brownian Interval: a new, fast, memory efficient, and exact way of sampling \textit{and reconstructing} Brownian motion.

Neural CDEs

  • Neural Controlled Differential Equations for Irregular Time Series (spotlight): NeurIPS20

T TS

We demonstrate how controlled differential equations may extend the Neural ODE model, which we refer to as the neural controlled differential equation (Neural CDE) model. Just as Neural ODEs are the continuous analogue of a ResNet, the Neural CDE is the continuous analogue of an RNN.

  • Neural CDEs for Long Time Series via the Log-ODE Method: arXiv20

T TS

  • Neural Controlled Differential Equations for Online Prediction Tasks: arXiv21

T TS

We identify several theoretical conditions that interpolation schemes for Neural CDEs should satisfy, such as boundedness and uniqueness. Second, we use these to motivate the introduction of new schemes that address these conditions, offering in particular measurability (for online prediction), and smoothness (for speed).

Generative Models

Normalizing Flows

  • Monge-Ampère Flow for Generative Modeling: arXiv18

IC

  • FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models: ICLR19

IC

  • Equivariant Flows: sampling configurations for multi-body systems with symmetric energies: arXiv18

  • Flows for simultaneous manifold learning and density estimation: NeurIPS20

T

We introduce manifold-learning flows (M-flows), a new class of generative models that simultaneously learn the data manifold as well as a tractable probability density on that manifold. We argue why such models should not be trained by maximum likelihood alone and present a new training algorithm that separates manifold and density updates.

  • TrajectoryNet: A Dynamic Optimal Transport Network for Modeling Cellular Dynamics arXiv20

  • Convex Potential Flows: Universal Probability Distributions with Optimal Transport and Convex Optimization: arXiv20

IC

CP-Flows are the gradient map of a strongly convex neural potential function. The convexity implies invertibility and allows us to resort to convex optimization to solve the convex conjugate for efficient inversion.

Diffusion Models

  • Score-Based Generative Modeling through Stochastic Differential Equations (best paper award): ICLR21

IC

Creating noise from data is easy; creating data from noise is generative modeling. We present a stochastic differential equation (SDE) that smoothly transforms a complex data distribution to a known prior distribution by slowly injecting noise, and a corresponding reverse-time SDE that transforms the prior distribution back into the data distribution by slowly removing the noise.

  • Denoising Diffusion Implicit Models

Denoising diffusion probabilistic models (DDPMs) have achieved high quality image generation without adversarial training, yet they require simulating a Markov chain for many steps to produce a sample. To accelerate sampling, we present denoising diffusion implicit models (DDIMs), a more efficient class of iterative implicit probabilistic models with the same training procedure as DDPMs. In DDPMs, the generative process is defined as the reverse of a Markovian diffusion process.

Applications

  • Learning Dynamics of Attention: Human Prior for Interpretable Machine Reasoning: NeurIPS19

Deep Learning Methods for Differential Equations

Solving Differential Equations

  • PDE-Net: Learning PDEs From Data: ICML18

Model Discovery

  • Universal Differential Equations for Scientific Machine Learning: arXiv20

NM

Dynamical System View of Deep Learning

Recurrent Neural Networks

T

  • AntysimmetricRNN: A Dynamical System View on Recurrent Neural Networks: ICLR19

  • Recurrent Neural Networks in the Eye of Differential Equations: arXiv19

T

  • Visualizing memorization in RNNs: distill19

  • One step back, two steps forward: interference and learning in recurrent neural networks: arXiv18

  • Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics: arXiv19

  • System Identification with Time-Aware Neural Sequence Models: AAAI20

  • Universality and Individuality in recurrent networks: NeurIPS19

Theory and Perspectives

T

  • Deep Learning Theory Review: An Optimal Control and Dynamical Systems Perspective: arXiv19

T

  • Stable Architectures for Deep Neural Networks: IP17

T

  • Beyond Finite Layer Neural Network: Bridging Deep Architects and Numerical Differential Equations: ICML18

  • Review: Ordinary Differential Equations For Deep Learning: arXiv19

Optimization

  • Gradient and Hamiltonian Dynamics Applied to Learning in Neural Networks: NIPS96

  • Maximum Principle Based Algorithms for Deep Learning: JMLR17

  • Hamiltonian Descent Methods: arXiv18

T

  • Port-Hamiltonian Approach to Neural Network Training: CDC19, code

T

  • An Optimal Control Approach to Deep Learning and Applications to Discrete-Weight Neural Networks: arXiv19

  • Optimizing Millions of Hyperparameters by Implicit Differentiation: arXiv19

  • Shadowing Properties of Optimization Algorithms: NeurIPS19

Software and Libraries

Python

  • torchdyn: PyTorch library for all things neural differential equations. repo, docs
  • torchdiffeq: Differentiable ODE solvers with full GPU support and O(1)-memory backpropagation: repo
  • torchsde: Stochastic differential equation (SDE) solvers with GPU support and efficient sensitivity analysis: repo
  • torchcde: GPU-capable solvers for controlled differential equations (CDEs): repo
  • torchSODE: PyTorch Block-Diagonal ODE solver: repo
  • neurodiffeq: A light-weight & flexible library for solving differential equations using neural networks based on PyTorch: repo

Julia

Neural differential equation solvers with O(1) backprop, GPUs, and stiff+non-stiff DE solvers. Supports stiff and non-stiff neural ordinary differential equations (neural ODEs), neural stochastic differential equations (neural SDEs), neural delay differential equations (neural DDEs), neural partial differential equations (neural PDEs), and neural jump stochastic differential equations (neural jump diffusions). All of these can be solved with high order methods with adaptive time-stepping and automatic stiffness detection to switch between methods.

  • NeuralNetDiffEq: Implementations of ODE, SDE, and PDE solvers via deep neural networks: repo

Websites and Blogs

  • Scientific ML Blog (Chris Rackauckas and SciML): link