sparse_tensor_dense_matmul cause ResourceExhaustedError

Question

sparse_tensor_dense_matmul cause ResourceExhaustedError

gumanchang opened this issue 5 years ago · 3 comments

Hi, I meet this error use my own dataset(about 2 million users), gpu is NVIDIA® Tesla® V100 32GB
code: temp_embed.append(tf.sparse_tensor_dense_matmul(A_fold_hat[f], ego_embeddings))
ResourceExhaustedError (see above for traceback): OOM when allocating tensor with shape[2231226,64] and type float on /job:localhost/replica:0/task:0/device:GPU:0 by allocator GPU_0_bfc
So, something wrong ?
Thank you :)

Answer 1 · 2019-07-15T00:46:40.000Z

Hi, Thanks for your interest. The error should be caused by the large scale of users. That is, in NGCF, we need to create an adjacency (or Laplacian) matrix, further converted into a sparse tensor in tf, whose size is equal to (#users+#items)*(#users+#items).

One possible solution might be to set the hyperparameter 'n_fold' a larger value (say, 10000), but I cannot guarantee that.

Thanks.

Answer 2 · 2019-07-15T03:43:18.000Z

Hi, I try you suggestion, but I got the opposite conclusion. When 'n_fold' = 1 it works well, but 'n_fold' = 10000 it too slow to run.
I know some GCN implementations do not have this 'n_fold' hyperparameter.
So, can I remove this hyperparameter without losing precision?

Answer 3 · 2019-07-15T06:06:58.000Z

Hi, yes, you can remove the hyperparameter directly. I set this parameter in case of the memory error.