/distribution-discrepancies

Maximum Mean Discrepancy (MMD), Kernel Stein Discrepancy (KSD), and Fisher Divergence

Primary LanguageJupyter Notebook

Distribution Discrepancies

  • Maximum Mean Discrepancy
  • Kernel Stein Discrepancy
  • Fisher Divergence

For more information, check out my KSD blog post!

To get set up:

  1. Install poetry
pip install poetry
  1. Install dependencies
poetry install

Integral Probability Metrics (IPMs)

Integral Probability Metrics (IPMs) are a comparison of two distribution under the transformation of some f in a function space F. In particular we search for f*, a witness function, a transformation that maximimally exposes the differences of x ~ P and y ~ Q.

We can visualise possible witness functions when P is Gaussian and Q follows a Laplace distribution

The Maximum Mean Discrepancy (MMD)

One example of an IPM is the Maximum Mean Discrepancy (MMD). The MMD is able to compare sets of samples from two distributions. As an example, we can calculate the MMD between MNIST images of two binary digits.

We can see the effect of the kernel hyper-parameters on the ability of the MMD to discriminate between MNIST digits

The Langevin Kernel Stein Discrepancy (KSD)

The Langevin Kernel Stein Discrepancy compares one set of samples from an unknown distribution with the density of a known distribution. It turns out that the MMD can be viewed as an approximation of the KSD when using the Langevin Stein Kernel. We can visualise this by seeing the MMD approach the KSD:

When using the Langevin Stein Kernel, we can see the MMD is a numerical approximation of the KSD

The Langevin Stein Kernel

The Langevin Stein Kernel has a complex mathematical formulation. We can break down the separate terms to better understand the different components:

The Langevin Stein Kernel for the Laplace Distribution