Coder-Yu/SELFRec

关于在yelp2018上复现SimGCL最佳结果的问题

lsc332211 opened this issue · 12 comments

大佬你好,最近在跑对比实验,我按照论文中的最佳参数跑了SimGCL,但是无法达到文中给的结果,相差有点多,不知道是什么问题,求解答。
下面是我的参数和结果:
Top 20
Hit Ratio:0.05351
Precision:0.02738
Recall:0.06133
NDCG:0.05087
2024-06-01 16:44:02,744 - SimGCL - INFO - training.set=./dataset/yelp2018/train.txt
2024-06-01 16:44:02,744 - SimGCL - INFO - test.set=./dataset/yelp2018/test.txt
2024-06-01 16:44:02,745 - SimGCL - INFO - model.name=SimGCL
2024-06-01 16:44:02,745 - SimGCL - INFO - model.type=graph
2024-06-01 16:44:02,745 - SimGCL - INFO - item.ranking=-topN 10,20
2024-06-01 16:44:02,745 - SimGCL - INFO - embedding.size=64
2024-06-01 16:44:02,745 - SimGCL - INFO - num.max.epoch=20
2024-06-01 16:44:02,745 - SimGCL - INFO - batch_size=8192
2024-06-01 16:44:02,745 - SimGCL - INFO - learnRate=0.001
2024-06-01 16:44:02,745 - SimGCL - INFO - reg.lambda=0.0001
2024-06-01 16:44:02,745 - SimGCL - INFO - SimGCL=-n_layer 3 -lambda 0.5 -eps 0.1
2024-06-01 16:44:02,745 - SimGCL - INFO - output.setup=-dir ./results/
2024-06-01 17:43:06,943 - SimGCL - INFO - ###Evaluation Results###
2024-06-01 17:43:06,943 - SimGCL - INFO - ['Top 10\n', 'Hit Ratio:0.03178\n', 'Precision:0.03253\n', 'Recall:0.03679\n', 'NDCG:0.04208\n', 'Top 20\n', 'Hit Ratio:0.05351\n', 'Precision:0.02738\n', 'Recall:0.06133\n', 'NDCG:0.05087\n']

你这取的是迭代20次完毕后的结果吧。你查看一下中间输出,有个best performance

这个就是best performance,而且还是eopch3的结果,之后就一直下跌到结束,epoch20的结果 ndcg直接跌到了0.03几...
代码我没任何改动,所以很疑惑...

我check一下 另外我用的batch size是2048

OK 我现在用2048再跑一次试试

image

刚试了没什么问题呀

目前还没跑完,但是从第二个epoch就一直下跌,rec_loss也和您的差很多
Uploading 2.PNG…

看不到你的图呀。你要不重新下载一下再跑跑?可能是哪里改过了?

一开始是要下跌的,然后会涨回来。

Embedding Dimension: 64
Maximum Epoch: 20
Learning Rate: 0.001
Batch Size: 2048
Regularization Parameter: 0.0001
Specific parameters: n_layer:3 lambda:0.5 eps:0.1
Training Set Size: (user number: 31668, item number 38048, interaction number: 1237259)
Test Set Size: (user number: 31668, item number 36073, interaction number: 324147)

Best Performance
Epoch: 1, Hit Ratio:0.05024 | Precision:0.02571 | Recall:0.05776 | NDCG:0.04805

Epoch: 2, Hit Ratio:0.04026 | Precision:0.0206 | Recall:0.0464 | NDCG:0.03868

Epoch: 3, Hit Ratio:0.03701 | Precision:0.01894 | Recall:0.0427 | NDCG:0.03564

Epoch: 4, Hit Ratio:0.03582 | Precision:0.01833 | Recall:0.0413 | NDCG:0.03459

Epoch: 5, Hit Ratio:0.03426 | Precision:0.01753 | Recall:0.03945 | NDCG:0.03314

Epoch: 6, Hit Ratio:0.03132 | Precision:0.01603 | Recall:0.03601 | NDCG:0.03033

然后是部分损失函数,图片上传失败,我复制一下

training: 1 batch 100 rec_loss: 0.6928188800811768 cl_loss 3.2166385650634766
training: 1 batch 200 rec_loss: 0.6923909783363342 cl_loss 3.092113733291626
training: 2 batch 100 rec_loss: 0.6896593570709229 cl_loss 2.9386730194091797
training: 2 batch 200 rec_loss: 0.688927173614502 cl_loss 2.928771495819092
training: 6 batch 100 rec_loss: 0.6644822955131531 cl_loss 2.8438878059387207
training: 6 batch 200 rec_loss: 0.6604670882225037 cl_loss 2.8412563800811768

好的,感谢百忙之中回复,我准备待会换一台设备再跑一下,请问learnRate=0.001,reg.lambda=0.0001是没问题的吧?

嗯,你按Readme上的leaderboard的参数配置肯定是可以跑出来的

感谢大佬,解决了,祝生活愉快。
Top 20
Hit Ratio:0.06398
Precision:0.03274
Recall:0.0728
NDCG:0.05994