anhaidgroup/deepmatcher

SIGMOD experiments reproducibility

alex-bogatu opened this issue · 2 comments

Hi,
I am trying to reproduce the experiments from the SIGMOD 2018 paper: http://pages.cs.wisc.edu/~anhai/papers1/deepmatcher-sigmod18.pdf. I am having a hard time finding the right setup and I get results far poorer than the ones reported in the paper for most of the datasets. Can you please give me a hint regarding the right setup? For example, what are the parameters for the hybrid setup? Using the defaults leads to poor results and following the existing guides in the repository did not help much.

As an example, for the (complete) iTunes-Amazon scenario the best I could obtain was F1: 35.09 | Prec: 33.33 | Rec: 37.04. But the paper reports better results.

Thank you!

Hi, here's a colab notebook showing how to reproduce the numbers in the paper for iTunes-Amazon structured: https://colab.research.google.com/drive/1CQFejG3-KeuFmMChsEoOeqypTS7njyJb#scrollTo=W4ixyezcQJPG

Note that neural network models may sometimes be unstable especially on small datasets - so you may need to run it multiple times. For the purposes of our sigmod paper, if I remember correctly, we ran each experiment 3-5 times, and reported the median.

Please re-open if you have any other questions.