PythonOT/POT

ot.sinkhorn2 returns nan when using torch.tensor

Closed this issue · 5 comments

Hi,
I'm trying to incorporate EMD distance into my loss function. Since the EMD is not differentiable, I try to use sinkhorn distance as the loss.
ot.emd2 and ot.skinhorn2 work well when the inputs and cost matrix are numpy arrays.

factory = np.array([0, 1.0, 0, 0])
center  = np.array([1.0, 0, 0, 0])
M = np.array([    [0.0,  1.0, 1.0, 1.414],
                 [1.0,  0.0, 1.414, 1.0],
                 [1.0,  1.414, 0.0, 1.0],
                 [1.414, 1.0, 1.0, 0.0]])

gamma_emd = ot.emd2(factory, center, M)
gamma_sinkhorn = ot.sinkhorn2(factory, center, M=M, reg=0.1, method='sinkhorn_log',verbose=True)
print(gamma_emd)
print(gamma_sinkhorn)

image

However, when I use tensor, ot.sinkhorn2 returns nan and the ot.sinkhorn returns matrix with inf value.

factory = torch.tensor([0, 1.0, 0, 0], requires_grad=True).to(device)
center  = torch.tensor([1.0, 0.0, 0, 0], requires_grad=True).to(device)
M = torch.tensor([[0.0,  1.0, 1.0, 1.414],
                  [1.0,  0.0, 1.414, 1.0],
                  [1.0,  1.414, 0.0, 1.0],
                  [1.414, 1.0, 1.0, 0.0]], requires_grad=True).to(device)

gamma_emd = ot.emd2(factory, center, M)
gamma_sinkhorn = ot.sinkhorn2(factory, center, M=M, reg=0.1, method='sinkhorn_log')
mat_sinkhorn = ot.sinkhorn(factory, center, M=M, reg=0.1, method='sinkhorn_log')
print(gamma_emd)
print(gamma_sinkhorn)
print(mat_sinkhorn)

image

Small first comment emd2 IS sub-differentiable wrt all its inputs in POT and works pretty well when optimized. It might be slover on GPU thou because the solver is run on CPU so you have some memory copy happening. I will have a look because this should not be happening, we test that torch and numpy tensors return the same value ...

Hello I'm sorry but executing you code with the last stable POT (with both numpy and torch backends) returns the following on a CUDA server:

It.  |Err         
-------------------
    0|4.440892e-16|
1.0
1.0000000000000004
tensor(1., device='cuda:0', grad_fn=<ValFunctionBackward>)
tensor(1., device='cuda:0', grad_fn=<SumBackward0>)
tensor([[0., 0., 0., 0.],
        [1., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], device='cuda:0', grad_fn=<ExpBackward>)

Small first comment emd2 IS sub-differentiable wrt all its inputs in POT and works pretty well when optimized. It might be slover on GPU thou because the solver is run on CPU so you have some memory copy happening. I will have a look because this should not be happening, we test that torch and numpy tensors return the same value ...

Thank you!
What is sub-differentiable? If so, can I use emd2 as the loss function in PyTorch?
Do I understand correctly: the motivation to use sinkhorn distance (solved from entropic regularized OT problem) is that is differentiable, efficient to compute than EMD, and strictly convex?

Hello I'm sorry but executing you code with the last stable POT (with both numpy and torch backends) returns the following on a CUDA server:

It.  |Err         
-------------------
    0|4.440892e-16|
1.0
1.0000000000000004
tensor(1., device='cuda:0', grad_fn=<ValFunctionBackward>)
tensor(1., device='cuda:0', grad_fn=<SumBackward0>)
tensor([[0., 0., 0., 0.],
        [1., 0., 0., 0.],
        [0., 0., 0., 0.],
        [0., 0., 0., 0.]], device='cuda:0', grad_fn=<ExpBackward>)

I tried again and it runs well now! I don't know what's going wrong before.
Thank you again!

Relu is not differentiable either and yet it seems to work rather well for deep learning ;). Sinkhorn can be faster when regularization is not too small and has a unique solution (hence unique gradients) but my experience is that on relatively small batches, emd2 can be competitive on some problems. Yes emd2 is Pytorch differentiable see following example
https://pythonot.github.io/auto_examples/backends/plot_wass2_gan_torch.html#sphx-glr-auto-examples-backends-plot-wass2-gan-torch-py

Closing the Issue