Validating distances against reference implementations
sdmccabe opened this issue · 10 comments
For each distance we should check that either (i) netrd
is the only public implementation of the distance, or (ii) that netrd
's implementation of the distance produces similar outputs given the same inputs. We've done this for a bunch of them already, typically when originally implementing the distance, but we should make this process more explicit so that we don't accidentally overlook one.
- Communicability JSD
- Degree Divergence
- Deltacon
- Distributional NBD
- DK-series distance
- Frobenius
- Hamming
- Hamming-Ipsen-Mikhailov
- Ipsen-Mikhailov
- Jaccard
- Laplacian Spectral
- NBD
- NetLSD
- NetSimile
- Onion Divergence
- Polynomial Dissimilarity
- Portrait Divergence
- Quantum JSD
- Resistance Perturbation
I'll start by checking off the ones I think are novel; I'm reasonably certain that there are more we know are validated.
There are differences in output between our distance and the reference implementation of Portrait Divergence, but the differences are consistently small (the largest I've seen is 0.005, and it's usually more like 0.001). I'll keep investigating but I'd guess it's nothing.
We should bump the PyPI version after finishing this.
HIM is producing different outputs from the R NetworkDistance
implementation for RGGs (N=200, p=0.26, using the edgelists from the graphwend repo); will need to investigate further.
@leotrs I've checked off NBD because I assume the implementations are the same.
At this point I wouldn't be surprised if netrd
's implementation is more updated than mine. However, you can forget about NBD as I am the maintainer of the other one. If the outputs from the two different repos are different, then probably netrd
's are correct...
NetSimile is a frustrating one since there isn't a reference implementation in the sense of author's code, so we're assuming the other independent implementations are correct. When I was debugging some NetSimile issues back in the spring I remember comparing the outputs to those from the netcomp
library; I don't know if anything has changed since but I believe they were producing similar or identical outputs.
We could use it as a touchstone only then. As long as we're in their ballpark, we're good.
Frobenius and Jaccard depend on row ordering, yes?
Unrelatedly, they both seem to be simple enough that we can just check them off?