Wasserstein hangup, same number of points at infinity
jacleveland opened this issue · 11 comments
The issue is almost surely with the high multiplicity of the points in both diagrams. You could try manually perturbing them by a tiny epsilon and see if that solves the problem.
For a proper diagnosis and solution, we need @grey-narn.
Yeah unfortunately I'm not sure exactly if I can get rid of the multiplicity, I am trying use persistence to compute a distance between artificial neural networks, but we are using a home brewed filtration and it only really works currently for 0th homology, so it is basically just persistence for a weighted graph. The multiplicity is because the neural network architecture we're studying right this second is convolutional networks. Ours has 25 weights in the filter and these are basically pasted all over the graph, so there should be 25 groups of multiplicities.
Anyway the issue never came up on the computer I used this summer, I have a different setup I'm using now.
Thanks for your time anyway, I am really impressed with the package overall.
Any way the package could be written to utilize cuda? Or has anyone done that before? Seems to me like the wasserstein computations could be pretty parallel.
Well, you can always manually perturb the diagrams by epsilon. This will fix the multiplicity problem and will introduce arbitrarily small error to the Wasserstein distance.
As for parallelization, we investigated it a couple of summers ago, but it's not nearly as simple as it sounds. There are long serial bottlenecks in the algorithm that we are using.
Dionysus uses float
for simplex data, so the points in the diagram have float
coordinates.
@mrzv If I replace double
with float
, my executable gets stuck as well. It seems to be the familiar numerical issue: the epsilon in epsilon-scaling gets too small to guarantee the increase of prices, and auction makes no progress.
@grey-narn So what's the right strategy? If I switch simplex data to double
, then the coordinates of the points in the diagram become double, and I'm guessing we can trigger numerical issues again, even if not with this input. I can keep everything as float
in Dionysus, but convert it to double
when passing to Hera. Are we guaranteed to avoid numerical issues then? I'm not sure if we ever figured this out. Do you know?
I just realized what you were asking. Well, having input in float and doing computations in double cannot help in all situations, because in the equation for epsilon we need to divide by the number of bidders and multiply with the cost of the current matching. An example that requires an arbitrary small value of epsilon can be constructed by taking two diagrams of high cardinality, but having a small distance to each other. Remember that if we want q>1, then we cannot remove duplicate points, so we can just take two points which are very close to each other and add as many duplicate points as we want.
@grey-narn Yeah, the general solution would require a much more complex yoga with exact predicates and such. We are not going there any time soon. I guess the real question before me is whether to switch all simplex data to double, or keep it as float, but convert to double when passing to Hera. Neither solution is perfect, it's just the question of what makes the most sense. I'm leaning towards the latter, since it's the least disruptive.