Reproduce experiment results on given Yelp data

Hi Nan Wang,

I would like to reproduce the result in your SIGIR'18 paper. Did you use a seperated validation data for MTER or just use training & testing data provided inside yelp_restaurant_recursive_entry_sigir directory?

Are the parameters given in the code the best setting?

MTER/parallel_implementation/MTER_tripletensor_tucker.py

Lines 100 to 108 in 0b9ba50

    
           U_dim = 15 
        
           I_dim = 15 
        
           F_dim = 12 
        
           W_dim = 12 
        
           lmd_BPR = 5 
        
           num_iter = 20000 
        
           num_processes = 4 
        
           lr = 0.1

Best regards,
Hoang

Hi, thank you for reaching out! The dataset should be divided into 80, 10, 10% for training, validation and testing. I am afraid the new implementation is not necessarily the best setting, as I have reimplemented it for parallel computing. The original experiments was conducted in my previous university's server. But I have tried to set the parameters based on my memory. Just let me know if you encounter any issue.

…

On Sun, Sep 22, 2019 at 11:48 PM Lê Trung Hoàng ***@***.***> wrote: Hi Nan Wang, I would like to reproduce the result in your SIGIR'18 paper. Did you use a seperated validation data for MTER or just use training & testing data provided inside yelp_restaurant_recursive_entry_sigir directory? Are the parameters given in the code the best setting? https://github.com/MyTHWN/MTER/blob/0b9ba505c1b01523ba1a252460d1b644d9d5ce46/parallel_implementation/MTER_tripletensor_tucker.py#L100-L108 Best regards, Hoang — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AEXFN46FMCV2L6BCEEQ5QKTQLA4BTA5CNFSM4IZGBXKKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HM52EDQ>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEXFN47AA47KC5MPBCE5YBLQLA4BTANCNFSM4IZGBXKA> .

-- Nan Wang <http://www.nanwang1996.com/> University of Virginia 85 Engineer's Way, Rice Hall nw6a@virginia.edu nanwang1996@gmail.com

I found that 'yelp_restaurant_recursive_entry_sigir' contains yelp_recursive_test.entry and yelp_recursive_train.entry which are split from yelp_recursive.entry, aren't they?

I ran the code with provided data & parameters and got the following results

	RMSE	NDCG@10	NDCG@20	NDCG@50	NDCG@100
MTER	1.1033	0.0016	0.0038	0.0090	0.0151

Yes, they are split from yelp_recursive.entry and are for the testing of the algorithm. You can split it to trn, tst and val with cross validation if you want to do experiments. It seems you are ranking all items to get the numbers, this could be expected to be very small. Try a candidate list of 200 or directly compare these numbers with other baselines under the same setting and you will see why.

…

On Mon, Sep 23, 2019 at 12:42 AM Lê Trung Hoàng ***@***.***> wrote: I found that 'yelp_restaurant_recursive_entry_sigir' contains yelp_recursive_test.entry and yelp_recursive_train.entry which are split from yelp_recursive.entry, aren't they? I ran the code with provided data & parameters and got the following results RMSE ***@***.*** ***@***.*** ***@***.*** ***@***.*** MTER 1.1033 0.0016 0.0038 0.0090 0.0151 — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AEXFN44BDS2E5RCURY3BSJLQLBCMBA5CNFSM4IZGBXKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7JY3UA#issuecomment-533958096>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEXFN44GSY25O57EVERLCATQLBCMBANCNFSM4IZGBXKA> .

-- Nan Wang <http://www.nanwang1996.com/> University of Virginia 85 Engineer's Way, Rice Hall nw6a@virginia.edu nanwang1996@gmail.com

Can you tell me more about how to get the candidate list of 200?

I randomly split data into 80% training, 10% validation, and 10% test. I compare with the following baselines:

MostPopular
NMF: latent factors k = 15, learning rate = 0.005, lambda_u=.06, lambda_v=.06, with 10k iterations
BPR: latent dimension k = 15, learning rate = 0.05, lambda_reg = 0.01, with 100 iterations

MTER: I used the same params as in (but with only 1 thread for result reproducible)

MTER/parallel_implementation/MTER_tripletensor_tucker.py

Lines 100 to 108 in 0b9ba50

    
           U_dim = 15 
        
           I_dim = 15 
        
           F_dim = 12 
        
           W_dim = 12 
        
           lmd_BPR = 5 
        
           num_iter = 20000 
        
           num_processes = 4 
        
           lr = 0.1

And the results as following:

	RMSE	NDCG@10	NDCG@20	NDCG@50	NDCG@100	Train (s)	Test (s)
MostPop	N/A	0.0102	0.0134	0.0193	0.0256	0.0235	21.038
NMF	1.0117	0.0009	0.0014	0.0023	0.0039	421.0895	31.8799
BPR	N/A	0.0281	0.0394	0.0614	0.0826	6.7818	28.3796
MTER	1.1055	0.0012	0.0037	0.0094	0.0154	7642.2824	38.8343

Could you please provide further guidelines on how to tune hyper parameters for MTER?

Given the numbers, I do not think it is the issue of hyper-parameter tuning, as even MostPop is much better than MTER. Please try to figure out the correctness of your experiment first. For example, if you make it extremely biased to the BPR in MTER, the result should converge to the simple BPR algorithm.

…

On Mon, Sep 23, 2019 at 6:34 AM Lê Trung Hoàng ***@***.***> wrote: I randomly split data into 80% training, 10% validation, and 10% test and compare with MostPopular and BPR baselines. Training details: - BPR: latent dimension k = 15, learning rate = 0.05, lambda_reg = 0.01 - MTER: I used the same params as in https://github.com/MyTHWN/MTER/blob/0b9ba505c1b01523ba1a252460d1b644d9d5ce46/parallel_implementation/MTER_tripletensor_tucker.py#L100-L108 Then I've got the results as following: RMSE ***@***.*** ***@***.*** ***@***.*** ***@***.*** Train (s) Test (s) MostPop N/A 0.0102 0.0134 0.0193 0.0256 0.0235 21.038 BPR N/A 0.0281 0.0394 0.0614 0.0826 6.7818 28.3796 MTER 1.1055 0.0012 0.0037 0.0094 0.0154 7642.2824 38.8343 Could you please provide further guidelines on how to tune hyper parameters for MTER? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AEXFN46PRQZQZCE4P7WLYQ3QLCLSBA5CNFSM4IZGBXKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7KN6LA#issuecomment-534044460>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEXFN47GSQCOIBI3UH7EKMTQLCLSBANCNFSM4IZGBXKA> .

-- Nan Wang <http://www.nanwang1996.com/> University of Virginia 85 Engineer's Way, Rice Hall nw6a@virginia.edu nanwang1996@gmail.com

BTW, try a larger iteration, e.g., 50000, to see if it is under-fitted.

Alright, this is the result when I increase the number of iterations to 50000.

	RMSE	NDCG@10	NDCG@20	NDCG@50	NDCG@100	Train (s)	Test (s)
MTER	1.2391	0.0076	0.0109	0.0177	0.0256	17409.1336	38.9075

So the training has not been converged when training. Do you have any suggestion to make the training process faster for convergence?

It seems the iteration I set for the new implementation was too small.. For efficiency, you may need a relatively powerful machine with multiple cores, and increase the number of processes.

…

On Mon, Sep 23, 2019 at 7:44 PM Lê Trung Hoàng ***@***.***> wrote: Alright, this is the result when I increase the number of iterations to 50000. RMSE ***@***.*** ***@***.*** ***@***.*** ***@***.*** Train (s) Test (s) MTER 1.2391 0.0076 0.0109 0.0177 0.0256 17409.1336 38.9075 So the training has not been converged when training. Do you have any suggestion to make the training process faster for convergence? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AEXFN42Q2AXK67FHDJKHODDQLFIF7A5CNFSM4IZGBXKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MS74Y#issuecomment-534327283>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AEXFN4ZCUL22TWCWUZOPKA3QLFIF7ANCNFSM4IZGBXKA> .

-- Nan Wang <http://www.nanwang1996.com/> University of Virginia 85 Engineer's Way, Rice Hall nw6a@virginia.edu nanwang1996@gmail.com

I modified the settings and you may try it.

…

On Mon, Sep 23, 2019 at 9:19 PM Nan ***@***.***> wrote: BTW, you can set a much larger weight on BPR than the point-wise prediction loss, if you only care the ranking performance. On Mon, Sep 23, 2019 at 9:17 PM Nan ***@***.***> wrote: > It seems the iteration I set for the new implementation was too small.. > > For efficiency, you may need a relatively powerful machine with multiple > cores, and increase the number of processes. > > On Mon, Sep 23, 2019 at 7:44 PM Lê Trung Hoàng ***@***.***> > wrote: > >> Alright, this is the result when I increase the number of iterations to >> 50000. >> RMSE ***@***.*** ***@***.*** ***@***.*** ***@***.*** Train (s) Test (s) >> MTER 1.2391 0.0076 0.0109 0.0177 0.0256 17409.1336 38.9075 >> >> So the training has not been converged when training. Do you have any >> suggestion to make the training process faster for convergence? >> >> — >> You are receiving this because you commented. >> Reply to this email directly, view it on GitHub >> <#4?email_source=notifications&email_token=AEXFN42Q2AXK67FHDJKHODDQLFIF7A5CNFSM4IZGBXKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD7MS74Y#issuecomment-534327283>, >> or mute the thread >> <https://github.com/notifications/unsubscribe-auth/AEXFN4ZCUL22TWCWUZOPKA3QLFIF7ANCNFSM4IZGBXKA> >> . >> > > > -- > Nan Wang <http://www.nanwang1996.com/> > University of Virginia > 85 Engineer's Way, Rice Hall > ***@***.*** > ***@***.*** > > > -- Nan Wang <http://www.nanwang1996.com/> University of Virginia 85 Engineer's Way, Rice Hall ***@***.*** ***@***.***

-- Nan Wang <http://www.nanwang1996.com/> University of Virginia 85 Engineer's Way, Rice Hall nw6a@virginia.edu nanwang1996@gmail.com

Alright, the performance is improved when your new updated params.

MTER/parallel_implementation/MTER_tripletensor_tucker.py

Line 105 in 59a194b

num_iter = 200000

MTER/parallel_implementation/tensor_sparse_multi_tasks_all_diff_paraserver.py

Line 127 in 59a194b

element_num_iter = 50

	RMSE	NDCG@10	NDCG@20	NDCG@50	NDCG@100	Train (s)	Test (s)
MTER	1.3759	0.0234	0.0337	0.0530	0.0733	104969.1012	37.2876

However, it still has lower performance as compare to BPR.

Could you please send me your evaluation code? I would like to reproduce your result in MTER paper.

Hi Nan Wang,

I tried to increase the number of iterations to 500K and 1 million, MTER achieves the following results.

	RMSE	NDCG@10	NDCG@20	NDCG@50	NDCG@100	Train (s)	Test (s)
MTER-500K iterations	1.3733	0.0272	0.0381	0.0597	0.0812	482968	57
MTER-1M iterations	1.3725	0.0292	0.0409	0.0630	0.0851	986867	29

I noticed that the results on NDGC are smaller than what have been reported in your paper. Perhaps I evaluated the ranking performance with negative samples from all items.

As you mentioned earlier:

Try a candidate list of 200 or directly compare these numbers with other baselines under the same setting and you will see

Do you mean that the reported results were evaluated on smaller negative samples (e.g., randomly sample 200 candidates as negative samples)?

Please correct me if I am wrong.
Thanks,
Hoang

Include all positive examples in testing set for each user and append the candidate list with negative samples to reach 200 candidates.

…

On Wed, Oct 23, 2019 at 5:18 AM Lê Trung Hoàng ***@***.***> wrote: Hi Nan Wang, I tried to increase the number of iterations to 500K and 1 million, MTER achieves the following results. RMSE ***@***.*** ***@***.*** ***@***.*** ***@***.*** Train (s) Test (s) MTER-500K iterations 1.3733 0.0272 0.0381 0.0597 0.0812 482968 57 MTER-1M iterations 1.3725 0.0292 0.0409 0.0630 0.0851 986867 29 I noticed that the results on NDGC are smaller than what have been reported in your paper. Perhaps I evaluated the ranking performance with negative samples from all items. As you mentioned earlier: Try a candidate list of 200 or directly compare these numbers with other baselines under the same setting and you will see Do you mean that the reported results were evaluated on smaller negative samples (e.g., randomly sample 200 candidates as negative samples)? Please correct me if I am wrong. Thanks, Hoang — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#4?email_source=notifications&email_token=AEXFN4YF2O7RYSZFYZUGVJLQQAJHPA5CNFSM4IZGBXKKYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOECAWQFA#issuecomment-545351700>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AEXFN42BQXQNN5CIQZZLZCDQQAJHPANCNFSM4IZGBXKA> .

-- Nan Wang <http://www.nanwang1996.com/> University of Virginia 85 Engineer's Way, Rice Hall nw6a@virginia.edu nanwang1996@gmail.com

	U_dim = 15
	I_dim = 15
	F_dim = 12
	W_dim = 12
	lmd_BPR = 5
	num_iter = 20000

	num_processes = 4
	lr = 0.1