THUDM/ComiRec

Questions about Performance (unable to reproduce)

UesugiErii opened this issue · 6 comments

I ran MIND 5 times on amazon, and the end result is not ideal.
my running cmd is python3 -u ./src/train.py --model_type MIND 2>&1 | tee MIND
The summary of the running results is as follows

test recall: 0.061517, test ndcg: 0.048181, test hitrate: 0.130982, test diversity: 0.236232
test recall: 0.061752, test ndcg: 0.048717, test hitrate: 0.131164, test diversity: 0.187387
test recall: 0.063693, test ndcg: 0.049689, test hitrate: 0.133500, test diversity: 0.193425
test recall: 0.061744, test ndcg: 0.049157, test hitrate: 0.131297, test diversity: 0.216244
test recall: 0.060717, test ndcg: 0.047014, test hitrate: 0.128497, test diversity: 0.197372

book_MIND_b128_lr0.001_d64_len20.zip

On Taobao, my running cmd is python3 -u ./src/train.py --model_type MIND --dataset taobao 2>&1 | tee MIND_1_taobao and my result is as below

test recall: 0.070153, test ndcg: 0.208209, test hitrate: 0.474027, test diversity: 0.645880
test recall: 0.070160, test ndcg: 0.207974, test hitrate: 0.475348, test diversity: 0.645950
test recall: 0.069222, test ndcg: 0.205945, test hitrate: 0.471723, test diversity: 0.649751
test recall: 0.068939, test ndcg: 0.206321, test hitrate: 0.471631, test diversity: 0.648632
test recall: 0.068423, test ndcg: 0.204582, test hitrate: 0.469000, test diversity: 0.656467

recall is obviously low, could the author provide your training log or reproduce the results based on this open source code of yourself?
taobao_MIND_b256_lr0.001_d64_len50.zip

ComiRec_SA

book

test recall: 0.079056, test ndcg: 0.048871, test hitrate: 0.162605, test diversity: 0.221267
test recall: 0.079164, test ndcg: 0.049336, test hitrate: 0.162953, test diversity: 0.208704
test recall: 0.079847, test ndcg: 0.048716, test hitrate: 0.162788, test diversity: 0.225382
test recall: 0.079649, test ndcg: 0.049464, test hitrate: 0.162970, test diversity: 0.212688
test recall: 0.083070, test ndcg: 0.050521, test hitrate: 0.168287, test diversity: 0.233185

taobao

test recall: 0.078244, test ndcg: 0.248630, test hitrate: 0.512879, test diversity: 0.612056
test recall: 0.080251, test ndcg: 0.253487, test hitrate: 0.520557, test diversity: 0.610459
test recall: 0.079639, test ndcg: 0.251549, test hitrate: 0.518704, test diversity: 0.613363
test recall: 0.079661, test ndcg: 0.251497, test hitrate: 0.519083, test diversity: 0.613545
test recall: 0.081349, test ndcg: 0.253527, test hitrate: 0.524622, test diversity: 0.611884

book_ComiRec-SA_b128_lr0.001_d64_len20.zip
taobao_ComiRec-SA_b256_lr0.001_d64_len50.zip

ComiRec_DR

book

test recall: 0.078207, test ndcg: 0.065589, test hitrate: 0.169878, test diversity: 0.190251
test recall: 0.074753, test ndcg: 0.061694, test hitrate: 0.161645, test diversity: 0.189489
test recall: 0.080397, test ndcg: 0.066745, test hitrate: 0.172760, test diversity: 0.188704
test recall: 0.077085, test ndcg: 0.064527, test hitrate: 0.166283, test diversity: 0.188471
test recall: 0.076346, test ndcg: 0.063015, test hitrate: 0.164361, test diversity: 0.187302

taobao

test recall: 0.078295, test ndcg: 0.245738, test hitrate: 0.520138, test diversity: 0.622615
test recall: 0.076828, test ndcg: 0.241771, test hitrate: 0.512920, test diversity: 0.627486
test recall: 0.077234, test ndcg: 0.244358, test hitrate: 0.516002, test diversity: 0.617406
test recall: 0.078039, test ndcg: 0.244164, test hitrate: 0.519288, test diversity: 0.628052
test recall: 0.077539, test ndcg: 0.244570, test hitrate: 0.517701, test diversity: 0.630594

taobao_ComiRec-DR_b256_lr0.005_d64_len50.zip
I lost DR data on Amazon.

run cmd is as follows
run_cmd.zip

Most of the results do not achieve the effect of the paper, could the author reproduce it and upload the run command and log?
@cenyk1230

My empirical results on two datasets are not ideal too.

My empirical results on two datasets are not ideal too.