Validation data
Opened this issue · 6 comments
Hi,
Thanks for making the code public!
You mentioned the train/valid/test split in paper but I cannot find the validation data in this code. Could you provide the code that produces the numbers in your paper? Thank you!
Hi,
Thanks for your interest. Sorry for the late reply after a busy week.
If my memory was correct, the training, validation, and testing datasets are split chronologically. Hence, you can use the first half part of testing instances as validation.
For all baselines and NGCF, the performance reported in our paper is done over the union of validation and testing parts. So you can just use the datasets provided in github to reproduce our performance.
Thanks.
Thanks Xiang for your clarification. Does such evaluation procedure have suspicion of overfitting the test data? Because part (even half) of the test data is seen during the training.
We keep all baselines and our model on the same page, for a fair comparison. Therefore, our results are reliable and reproducible. Of course, we will follow more restrict experimental settings in our future work. Thanks for your valuable suggestions!
Hi, thanks for your work. I cannot find the validation data in your code and released data either. And what's the meaning of "the performance reported in our paper is done over the union of validation and testing parts" ? As I understand based on the code below, there is no special validation data, and you test the model every epoch with the test data, and report the best result as the final result, right? Or do you conduct experiments as described in the article, which it is not the same as the code?
for epoch in range(args.epoch):
t1 = time()
loss, mf_loss, emb_loss, reg_loss = 0., 0., 0., 0.
n_batch = data_generator.n_train // args.batch_size + 1
for idx in range(n_batch):
users, pos_items, neg_items = data_generator.sample()
_, batch_loss, batch_mf_loss, batch_emb_loss, batch_reg_loss = sess.run([model.opt, model.loss, model.mf_loss, model.emb_loss, model.reg_loss],
feed_dict={model.users: users, model.pos_items: pos_items,
model.node_dropout: eval(args.node_dropout),
model.mess_dropout: eval(args.mess_dropout),
model.neg_items: neg_items})
loss += batch_loss
mf_loss += batch_mf_loss
emb_loss += batch_emb_loss
reg_loss += batch_reg_loss
if np.isnan(loss) == True:
print('ERROR: loss is nan.')
sys.exit()
# print the test evaluation metrics each 10 epochs; pos:neg = 1:10.
if (epoch + 1) % 10 != 0:
if args.verbose > 0 and epoch % args.verbose == 0:
perf_str = 'Epoch %d [%.1fs]: train==[%.5f=%.5f + %.5f]' % (
epoch, time() - t1, loss, mf_loss, reg_loss)
print(perf_str)
continue
t2 = time()
users_to_test = list(data_generator.test_set.keys())
ret = test(sess, model, users_to_test, drop_flag=True)
t3 = time()
loss_loger.append(loss)
rec_loger.append(ret['recall'])
pre_loger.append(ret['precision'])
ndcg_loger.append(ret['ndcg'])
hit_loger.append(ret['hit_ratio'])
if args.verbose > 0:
perf_str = 'Epoch %d [%.1fs + %.1fs]: train==[%.5f=%.5f + %.5f + %.5f], recall=[%.5f, %.5f], ' \
'precision=[%.5f, %.5f], hit=[%.5f, %.5f], ndcg=[%.5f, %.5f]' % \
(epoch, t2 - t1, t3 - t2, loss, mf_loss, emb_loss, reg_loss, ret['recall'][0], ret['recall'][-1],
ret['precision'][0], ret['precision'][-1], ret['hit_ratio'][0], ret['hit_ratio'][-1],
ret['ndcg'][0], ret['ndcg'][-1])
print(perf_str)
cur_best_pre_0, stopping_step, should_stop = early_stopping(ret['recall'][0], cur_best_pre_0,
stopping_step, expected_order='acc', flag_step=5)
# *********************************************************
# early stopping when cur_best_pre_0 is decreasing for ten successive steps.
if should_stop == True:
break
# *********************************************************
# save the user & item embeddings for pretraining.
if ret['recall'][0] == cur_best_pre_0 and args.save_flag == 1:
save_saver.save(sess, weights_save_path + '/weights', global_step=epoch)
print('save the weights in path: ', weights_save_path)
Hi, thanks for your work. I cannot find the validation data in your code and released data either. And what's the meaning of "the performance reported in our paper is done over the union of validation and testing parts" ? As I understand based on the code below, there is no special validation data, and you test the model every epoch with the test data, and report the best result as the final result, right? Or do you conduct experiments as described in the article, which it is not the same as the code?
for epoch in range(args.epoch): t1 = time() loss, mf_loss, emb_loss, reg_loss = 0., 0., 0., 0. n_batch = data_generator.n_train // args.batch_size + 1 for idx in range(n_batch): users, pos_items, neg_items = data_generator.sample() _, batch_loss, batch_mf_loss, batch_emb_loss, batch_reg_loss = sess.run([model.opt, model.loss, model.mf_loss, model.emb_loss, model.reg_loss], feed_dict={model.users: users, model.pos_items: pos_items, model.node_dropout: eval(args.node_dropout), model.mess_dropout: eval(args.mess_dropout), model.neg_items: neg_items}) loss += batch_loss mf_loss += batch_mf_loss emb_loss += batch_emb_loss reg_loss += batch_reg_loss if np.isnan(loss) == True: print('ERROR: loss is nan.') sys.exit() # print the test evaluation metrics each 10 epochs; pos:neg = 1:10. if (epoch + 1) % 10 != 0: if args.verbose > 0 and epoch % args.verbose == 0: perf_str = 'Epoch %d [%.1fs]: train==[%.5f=%.5f + %.5f]' % ( epoch, time() - t1, loss, mf_loss, reg_loss) print(perf_str) continue t2 = time() users_to_test = list(data_generator.test_set.keys()) ret = test(sess, model, users_to_test, drop_flag=True) t3 = time() loss_loger.append(loss) rec_loger.append(ret['recall']) pre_loger.append(ret['precision']) ndcg_loger.append(ret['ndcg']) hit_loger.append(ret['hit_ratio']) if args.verbose > 0: perf_str = 'Epoch %d [%.1fs + %.1fs]: train==[%.5f=%.5f + %.5f + %.5f], recall=[%.5f, %.5f], ' \ 'precision=[%.5f, %.5f], hit=[%.5f, %.5f], ndcg=[%.5f, %.5f]' % \ (epoch, t2 - t1, t3 - t2, loss, mf_loss, emb_loss, reg_loss, ret['recall'][0], ret['recall'][-1], ret['precision'][0], ret['precision'][-1], ret['hit_ratio'][0], ret['hit_ratio'][-1], ret['ndcg'][0], ret['ndcg'][-1]) print(perf_str) cur_best_pre_0, stopping_step, should_stop = early_stopping(ret['recall'][0], cur_best_pre_0, stopping_step, expected_order='acc', flag_step=5) # ********************************************************* # early stopping when cur_best_pre_0 is decreasing for ten successive steps. if should_stop == True: break # ********************************************************* # save the user & item embeddings for pretraining. if ret['recall'][0] == cur_best_pre_0 and args.save_flag == 1: save_saver.save(sess, weights_save_path + '/weights', global_step=epoch) print('save the weights in path: ', weights_save_path)
I'm having the same doubt here. Do you have any idea about that issue now? Otherwise, maybe the results of that article are not trustworthy.
same concern as the blow, it's highly likely you are conducting the validation on the test set. So, the reported results and performance are not generalizable.... could you please explain this for us ? @xiangwang1223