johnny12150/NISER

Consider masking out the padding embeddiing at the tail of each session and adding positional embedding.

Closed this issue · 7 comments

In the original paper the normalization is made with items in one session. While when implementing the algorithm we need to pad some positions to make the length of sessions the same in one batch. So when calculating the l2 norm the items at irrelative positions should be ignored. The author also found adding positional embedding is a little helpful. By the way, I'm looking forward for your number on Star GNN. Now I have reproduced the number for yoochoose1/64 while I can't get the same number on diginetica. Thank you a lot!

Thanks for the advice.
I will update the code to see if the performance improves.

By the way, could you share the Star GNN code with me?
I am still checking my codes since I can reach the performance claimed on the paper with both datasets.

Do you mean you have reached the performance claimed in the paper with both datasets? Since that, could you please release the code? Thanks! I will share with you my code once I have organized the code well. I just found layer norm useless and I am trying to increase the number with diginetica.

Oops! I mean I haven't reached the performance yet. It should be can't.
Sorry for the inconvenience.

I found fixing the order of training sample and use hidden size 256(introduced in the original paper) with l2 normalization increase the number greatly. Maybe you can try them in your star gnn code.

Does it seem that the performance gain is from the l2 norm instead of the star graph topology?

Mainly from larger hidden size and l2 norm(which is introduced in NISER), star only bring a little gain. If you set the hidden size to be 100 like SR-GNN and NISER, the performance will drop a lot.

Hidden size and l2 norm really boost the performance for many models!