gusye1234/LightGCN-PyTorch

RAM problem while generating adjacency matrix

kontrabas380 opened this issue · 3 comments

Hello,

I wanted to try this solution for my own dataset that consists around 1 mln users and about 300k items (and about 42 mln interactions for training). Unfortunately, when I've prepared data for my tests and I started script, my process was killed because of exceeding 240Gi of RAM. It happens while generating adjacency matrix in the dataloader in line:
adj_mat[:self.n_users, self.n_users:] = R

Is there possibility to make it differently? Or my dataset is just too big for use with LightGCN?

Best regards

Hi, for the readability and convenience of the code, we choose to use a big matrix to place both R and R.T.
So right now that's the only way in this implementation.
We're aware of the possible RAW overflow problem, too. So we can communicate about another more memory-efficient way to store the adj matrix as well as keep the functionality of LightGCN.

You can simply rewrite the loading method.

For now, the repo uses the UserItemNet which is NxM, and assigns the matrix to the bigger matrix's slices, which cost a lot of time and RAM.
You can just read the item's remapped ID as remapped_item_idx + num_of_users, then use this to generate the sparse adjacency matrix directly, which is just a csr_matrix calling.

To do so, modify the reading method here https://github.com/gusye1234/LightGCN-PyTorch/blob/master/code/dataloader.py#L241 ;
Then modify this method: https://github.com/gusye1234/LightGCN-PyTorch/blob/master/code/dataloader.py#L332 ;

And note that, in order to find the num of users before you read the train user-item data (UserItemNet), you may need to read the user_list.txt in advance.

And besides all that, which I have tried, you still need to think about the training consuming for a larger graph, since it will cost a lot more time than the original little graph.

And besides all that again, may I ask one more question to the team, @gusye1234. So the LightGCN is not based on GraphSAGE right? It uses the whole graph information for the propagation in the computer() method, which is more like a GCN style? I thought LightGCN is just simply removing all of the non-linear layers from NGCF, but it seems not? I am a little confused, please correct me if I am wrong.

Thank @Gongzq5 for the patiently reply.
As for your question, I think you can imagine LightGCN as a reduced NGCF, and that was the exact original idea: if the non-linear transformations are not that useful as we think in NGCF, then why not use a reduced NGCF.
And turns out, the result has a pretty clean and simple message passing mechanism, a GCN's style.