Doubt about data set statistics and experimental results

Question

Doubt about data set statistics and experimental results

Closed this issue 2 years ago · 6 comments

I'm so sorry to bother you again. After following the data preprocessing method you provided for the ML-20M dataset, I encountered two issues during the experiment.

The first issue is that I noticed that in "Table 1: Detailed statistics of experimental datasets," the term "Interactions" seems to represent the sum of session_id counts from both the ml-20m.train.inter and ml-20m.test.inter files, as they were concatenated in the code of DCRec. This sum does not reflect the actual number of user interactions. The true number of user interactions appears to be 1,856,746.

The second issue is that, following the data processing method you provided, I used the same data to generate input formats for both the DCRec and SURGE models. I verified that the number of interactions input to both DCRec and SURGE is 1,856,746. In the case of DCRec, I obtained an ndcg@10' result of 0.3001 on the test set, which matches the result reported in the paper. However, in the case of the SURGE experiment, I obtained an ndcg@10' result of 0.4391 on the test set, which is higher than the result presented in the paper.
Could it be because DCRec has more than 100 negative samples on the validation and test sets?

Thank you for your understanding and assistance. I greatly appreciate your guidance in addressing these issues. If my description is incorrect, please feel free to correct me. If I have offended you in any way, I sincerely apologize. Once again, thank you for your time and support.

Answer 1 · 2023-08-04T17:16:16.000Z

Hi @BiuBiuBia ,

Could you provide details on how you run SURGE? Did you make sure the negatives are identical?

Answer 2 · 2023-08-05T06:12:40.000Z

I deeply apologize for my mistake. I have realized that the default negative sampling methods of the two models are different. Thank you very much for your assistance.

Answer 3 · 2023-08-06T13:40:41.000Z

I'm sorry to bother you again. After receiving your reminder, I tried using popularity-based negative sampling for the SURGE model. However, due to my limited expertise, I couldn't achieve the same results as in your paper. I noticed that your project files do not include the SURGE model. Could you please share the code you used for the SURGE model in your experiments? I am deeply grateful for the assistance you have provided.

Answer 4 · 2023-08-08T03:17:05.000Z

Due to the remapping issue, I am unable to directly input the negative samples generated by DCRec into SURGE.

Therefore, I attempted to mimic your popularity-based sampling method in SURGE. I followed the approach of DCRec by extracting the last two items from the preprocessed interaction sequences to create an alias_table. Then, I used the same pop_sampling function to sample negative samples, excluding items already present in the positive samples. I repeated the sampling process, generating a hundred negative samples for each sequence.

I have tried all the methods I could think of, but I still cannot achieve the results presented in your paper. Could I kindly request some hints from you, or if possible, could you share the experimental SURGE project code with me at your convenience, solely for academic exchange purposes? Your assistance is greatly appreciated.

Answer 5 · 2023-08-10T07:00:37.000Z

Hi @BiuBiuBia ,

Sorry for delayed response. I followed this practice to reproduce SURGE on my data:

collect remapped data from RecBole
collect exactly the same negatives sampled by RecBole
In SURGE's recommenders environment, implement a new evaluation function to manually calculate metrics from the collected samples

As you can see it is truly laborious. And I don't really recommend reproducing SURGE since it has pretty serious defect, i.e., gradient issues (tsinghua-fib-lab/SIGIR21-SURGE#9).

Answer 6 · 2023-08-10T08:48:15.000Z

Thank you for your response. Indeed, I encountered the gradient issue you mentioned while attempting to replicate the results of SURGE . I intend to temporarily set aside this approach and, if necessary, explore the validation method you mentioned. Once again, I appreciate your reply.