xiaxin1998/DHCN

How to generate the session embedding during the testing phase

rowedenny opened this issue · 10 comments

Hi, thanks for sharing this interesting work.

I am confused about the generation of session embedding during the training and testing phases. Specifically, during the training, we can generate a session embedding via the hypergraph, which means we use the interactions in the other sessions to learn the embedding for items within the current batch.

When it comes to the testing, we only get access to the interactions within the current session, without the interactions from the other sessions. However when I read the code, I found no matter in the training or testing phase, the function build will always go through the hypergraph conv layer, and then it generates an embedding for the current session, as refers to,

DHCN/model.py

Line 129 in aeb54db

self.item_embedding, self.item_embedding_layer0 = self.HypergraphConv()

I am not sure if I miss some key details, but I find it is hard to understand the current implementation in generating session embedding during the testing phase. Would you mind clarifying it?

Hi,
We have noticed this problem and we are fixing it. So, currently, the results in the arXiv preprint should be updated. I suggest you follow the updated implementation after a few days.

Thanks for the quick response. Would you mind pin me when it is fixed?

I will. Thank you for your interest in our work.

Hi, Sorry to bother you again.
Would you mind helping me figure out what would be the best approach to generate the session embedding using hypergraph using the inference?

Typically we can use the aggregation operation over all the items in the session given the item embedding, however in hypergraph with batch training, this item embedding would be changed across different batches, so I find it is hard to figure out what is the best way?

Thank you for your time.

Hi, I cannot catch on to the question. "however in hypergraph with batch training, this item embedding would be changed across different batches" What if the item embeddings are changed across batches? Does it have a negative impact on generating optimal session embeddings? Can you clarify it?
By the way, we have addressed the issues in our codes and paper. You can refer to the updated paper (the dropbox link) and implementation for some helpful thoughts (maybe).

Sounds awesome, I do notice that the code has been updated, I will definitely take a look at it.

Please allow me to clarify my puzzle:
In the training phase, we collect a batch of sequences, construct the hypergraph, and then the corresponding embedding will be generated from the initial item embeddings. Having the final embedding for the items in the batch, we can generate a representation for each session.
However, in the testing phase, the model is not aware of sequences from other sessions. The gap here is how to generate the final item embedding from the initial item embedding, such that we could have the session representation. Does it mean that during the testing phase, we shall first take the historical interactions across all sessions, and construct the hypergraph, then we can generate the final embedding for items?

I see. In our paper, we perform graph convolution on all the observed data, ie., we build an adjacency matrix on all the training data. In this way, we can circumvent the training relying on batches and use the item embeddings at the last layer (methods like SR-GNN which are based on batch training only use the 0th layer item embeddings). In the test phase, we still obtain the session embeddings with the adjacency matrix over the training data and there is no need to visit other sessions in the test set. You can refer to our implementation for the details.
It should be mentioned that performing convolution on the full graph is a bit time-consuming (12 mins on Diginetic per iteration ). But it does make progress and has better results.

Please correct me if I misunderstand:
In the training phase, for one session [i_1, i_2, i_3, i_4], will this correspond to one hyperedge, instead of 3 hyperedges after data augmentation?
A minor concern is if there exists a hyperedge connecting i_1 and i_3, however, during the training with data augmentation, we will predict i_3 given [i_1, i_2]. Is it OK?

In the testing phase, the session representation is actually generated from the longest session., thus it is just from [i_1, i_2, i_3, i_4].

First question: 3 hyperedges
Second: Yes
Third: In the test phase, the session also is split into multiple subsegments.

We build the adjacency matrix over the augmented dataset. However, I think maybe it is also ok to just use the original sessions.

Awesome, I think I have no more questions. I will take a further look at the implementation.

Thank you so much for your replies detailedly.

By the way, I agree with your following hypothesis. My observation when training with the original sessions instead of multiple hyperedges for one session, there is no significant difference.

However, I think maybe it is also ok to just use the original sessions.