Quick question
Closed this issue · 2 comments
Hi,
Good question ! The answer is : we don't need to do that ! Why ? Thanks to Danskin's theorem ! (https://en.wikipedia.org/wiki/Danskin%27s_theorem).
The entropic-regularized UOT has a unique solution, so we have a gradient with respect to the ground cost C. Furthermore, we have an unbiased estimator of the expectation of minibatch OT, so we can exchange gradients and expectations. Thus minimizing our empirical estimator leads to the minimum of the whole problem. This is justified with our second theorem.
Note that your suggestion would work. This was used to get a differentiable loss with a fixed budget of Sinkhorn iteration. But with a small budget you do not get the optimal solution, that is why I prefer using Danskin theorem. This is also what the original deepjdot algorithm did https://github.com/bbdamodaran/deepJDOT
Thanks a lot for the detailed & quick response.