About the hyper-parameter tuning

Question

About the hyper-parameter tuning

DexterZeng opened this issue 4 years ago · 2 comments

Hi, thanks for sharing the codes and it has been really helpful!

You have mentioned that, dropout, coefficient, c0 and c1 are important parameters that need to be tuned for new datasets. However, I wonder whether you can provide some directions/suggestions about how to tune these hyper-parameters. Currently I am running the model on a small and sparse dataset extracted from the original amazon-book dataset. Nevertheless, the model attains the best performance at the first few rounds and the results continue to drop with increasing numbers of rounds. I changed these hyper-parameters and the performance did not improve. Therefore, I wonder whether there is a general guideline about the parameter-tuning.

Besides, is there any reason/explanation for calculating c0/c1 as in the codes? I guess understanding it might help the parameter-tuning.

Thanks for your time!

Answer 1 · 2020-12-01T04:24:55.000Z

Hi, thanks for your interest in our work!

c0 and c1 determine the overall weight of non-observed instances in implicit feedback data. Specifically, c0 is for the recommendation task and c1 is for the knowledge embedding task.

To tune c0 and c1, a simple way is first to tune an optimal uniform value of negative weight in Main_JNSKR.py, then tune c0 and c1 to make the mean value of the weight distribution near to the uniform value.

Specifically, you can first replace negative_c and negative_ck in Main_JNSKR.py (line 96) with a constant (0.001,0.005,0.01,0.02,0.05,0.1,0.2,0.5), respectively. Then tune c0 and c1 to make the mean value of the weight distribution near to the uniform value. You can see the weight value from the printed results in our_helper.py (line 61-62).

For dropout, you can tune it in [0.1~1.0]

For coefficient, you can only tune the second term in [0.001,0.01,0.05...]

Generally, it is a little difficult to tune the negative weight, a very detailed explanation for calculating c0/c1 as in the codes can be found in paper (page 7~8, 3.3 Weighting Strategies for Missing Data).

Answer 2 · 2020-12-01T06:55:51.000Z

Thanks a lot for the detailed explanation!