Normalizing Flows

Example of a normalizing flow (Real NVP) learning the distribution on the left.

Intro

Normalizing flows operate by pushing a simple density through a series of transformations to produce a richer, potentially more multi-modal distribution. -- Papamakarios et al. 2021

These transformations have to be bijective, differentiable with a differentible inverse and a functional determinant $det(T^{-1})\neq 0$ , in short is a diffeomorphism (Note that in the NF literature the terms Bijector and diffeomorphism are used interchangably).

Building a custom Bijector with distrax

We start with a linear map given by:

$T:\mathbb{R}^{2} \rightarrow \mathbb{R}^{2}, \left(\begin{array}{c} x_{1}\\ x_{2} \end{array}\right) \mapsto \left(\begin{array}{cc} cos(\theta) &-sin(\theta)\\ sin(\theta) & cos(\theta) \end{array}\right) \left(\begin{array}{c} x_{1}\\ x_{2} \end{array}\right)$

with inverse $T^{-1}$ :

$T^{-1}:\mathbb{R}^{2} \rightarrow \mathbb{R}^{2}, \left(\begin{array}{c} x_{1}\\ x_{2} \end{array}\right) \mapsto \left(\begin{array}{cc} cos(\theta) &sin(\theta)\\ -sin(\theta) & cos(\theta) \end{array}\right) \left(\begin{array}{c} x_{1}\\ x_{2} \end{array}\right)$

and functional determinant $det(T^{'})=sin^{2}(\theta)+cos^{2}(\theta)=1$ .

In distrax we can construct the above map by subclassing the Bijector class.

import distrax
import jax.numpy as jnp

class OrthogonalProjection2D(distrax.Bijector):
    def __init__(self, theta):
        super().__init__(event_ndims_in=1, event_ndims_out=1)
        self.thetas = theta
        self.sin_theta = jnp.sin(theta)
        self.cos_theta = jnp.cos(theta)
        self.R = jnp.array(
            [[self.cos_theta, -self.sin_theta], [self.sin_theta, self.cos_theta]]
        ).T

    def forward(self, x):
        return jnp.matmul(x, self.R)

    def inverse(self, x):
        return jnp.matmul(x, self.R.T)

    def forward_and_log_det(self, x):
        y = self.forward(x)
        logdet = 1
        return y, logdet

    def inverse_and_log_det(self, x):
        y = self.inverse(x)
        logdet = 1
        return y, logdet

Transforming an independent multivariate Gaussian distribution $\Sigma=cov(X)$ with the OrthogonalProjection2D for $\theta=45^{\circ}$ yields a multivariate Gaussian distribution which is no longer independent, as can be seen below: Since the above bijector is linear we already knew that $cov(Y)=cov(TX)=T\Sigma T^{'}$ where $\Sigma=cov(X)$ .

In the image below we chained shift, scale and the Orthogonal Projector. On the left hand side the true distribution is depicted and on the right hand side the inferred ditribution using maximum likelihood for the shift parameter $a$, the scale parameter $b$ and the rotation parameter $\theta$: