MyTHWN/MTER

Reproduce experiment results on given Yelp data

Closed this issue · 13 comments

Hi Nan Wang,

I would like to reproduce the result in your SIGIR'18 paper. Did you use a seperated validation data for MTER or just use training & testing data provided inside yelp_restaurant_recursive_entry_sigir directory?

Are the parameters given in the code the best setting?

U_dim = 15
I_dim = 15
F_dim = 12
W_dim = 12
lmd_BPR = 5
num_iter = 20000
num_processes = 4
lr = 0.1

Best regards,
Hoang

I found that 'yelp_restaurant_recursive_entry_sigir' contains yelp_recursive_test.entry and yelp_recursive_train.entry which are split from yelp_recursive.entry, aren't they?

I ran the code with provided data & parameters and got the following results

RMSE NDCG@10 NDCG@20 NDCG@50 NDCG@100
MTER 1.1033 0.0016 0.0038 0.0090 0.0151

Can you tell me more about how to get the candidate list of 200?

I randomly split data into 80% training, 10% validation, and 10% test. I compare with the following baselines:

  • MostPopular
  • NMF: latent factors k = 15, learning rate = 0.005, lambda_u=.06, lambda_v=.06, with 10k iterations
  • BPR: latent dimension k = 15, learning rate = 0.05, lambda_reg = 0.01, with 100 iterations
  • MTER: I used the same params as in (but with only 1 thread for result reproducible)
    U_dim = 15
    I_dim = 15
    F_dim = 12
    W_dim = 12
    lmd_BPR = 5
    num_iter = 20000
    num_processes = 4
    lr = 0.1

And the results as following:

RMSE NDCG@10 NDCG@20 NDCG@50 NDCG@100 Train (s) Test (s)
MostPop N/A 0.0102 0.0134 0.0193 0.0256 0.0235 21.038
NMF 1.0117 0.0009 0.0014 0.0023 0.0039 421.0895 31.8799
BPR N/A 0.0281 0.0394 0.0614 0.0826 6.7818 28.3796
MTER 1.1055 0.0012 0.0037 0.0094 0.0154 7642.2824 38.8343

Could you please provide further guidelines on how to tune hyper parameters for MTER?

BTW, try a larger iteration, e.g., 50000, to see if it is under-fitted.

Alright, this is the result when I increase the number of iterations to 50000.

RMSE NDCG@10 NDCG@20 NDCG@50 NDCG@100 Train (s) Test (s)
MTER 1.2391 0.0076 0.0109 0.0177 0.0256 17409.1336 38.9075

So the training has not been converged when training. Do you have any suggestion to make the training process faster for convergence?

Alright, the performance is improved when your new updated params.


RMSE NDCG@10 NDCG@20 NDCG@50 NDCG@100 Train (s) Test (s)
MTER 1.3759 0.0234 0.0337 0.0530 0.0733 104969.1012 37.2876

However, it still has lower performance as compare to BPR.

Could you please send me your evaluation code? I would like to reproduce your result in MTER paper.

Hi Nan Wang,

I tried to increase the number of iterations to 500K and 1 million, MTER achieves the following results.

RMSE NDCG@10 NDCG@20 NDCG@50 NDCG@100 Train (s) Test (s)
MTER-500K iterations 1.3733 0.0272 0.0381 0.0597 0.0812 482968 57
MTER-1M iterations 1.3725 0.0292 0.0409 0.0630 0.0851 986867 29

I noticed that the results on NDGC are smaller than what have been reported in your paper. Perhaps I evaluated the ranking performance with negative samples from all items.

As you mentioned earlier:

Try a candidate list of 200 or directly compare these numbers with other baselines under the same setting and you will see

Do you mean that the reported results were evaluated on smaller negative samples (e.g., randomly sample 200 candidates as negative samples)?

Please correct me if I am wrong.
Thanks,
Hoang