My training is very slow

Question

My training is very slow

linhaojia13 opened this issue 5 years ago · 6 comments

Hi, I trained the sem_seg_sparse model using 2 Tesla V100 GPUs, however, I found the training is very very slow and it takes several minutes to forward a batch! Just a batch! The training is under the default configuration. I had checked the GPU situation and found it is working. Do you have any idea about this?

Answer 1 · 2020-03-27T14:42:32.000Z

Hi, it is very slow if you use the spare version since pytorch geometric dose not optimize for knn graph. I suggest you to use the sem_seg_dense which uses a dense data format and the gcns are implemented with the native pytorch.

…

On Fri, Mar 27, 2020 at 5:04 PM 林豪佳 ***@***.***> wrote: Hi, I trained the sem_seg_sparse model using 2 Tesla V100 GPUs, however, I found the training is very very slow and it takes several minutes to forward a batch! Just a batch! The training is under the default configuration. I had checked the GPU situation and found it is working. Do you have any idea about this? — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#20>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFUJTYBHFZVCRLI67N6FLH3RJSW6XANCNFSM4LVA5HDQ> .

Answer 2 · 2020-03-27T15:54:56.000Z

The dense version should be faster but it is still slow since the network is very deep. To further accelerate the training, you can use 14 layers instead of 28 layers, use mrconv instead of edgeconv by setting "--conv mr", use smaller knn by setting '--kernel_size' smaller... Or you can try the ppi experiment which uses a smaller dataset.

…

On Fri, Mar 27, 2020 at 6:40 PM 林豪佳 ***@***.***> wrote: Hi, Thank you for your prompt reply. I follow your advice to use the dense version but it comes the same problem as I trained the sparse one. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFUJTYC4T2WFQ4HT5FDNN5LRJTCHPANCNFSM4LVA5HDQ> .

Answer 3 · 2020-03-27T16:23:32.000Z

Thank you for you advice, now I realize that deep_gcns_torch may be not a friendly repo to me and my GPUs ...hhhhh
By the way, is there a gap on time efficiency between the tensorflow version and pytorch version of DeepGCNs ?

Answer 4 · 2020-03-27T16:39:24.000Z

The tensorflow version is faster. But it still takes about two days to finish 100 epochs.

…

On Fri, Mar 27, 2020 at 7:23 PM 林豪佳 ***@***.***> wrote: Thank you for you advice, now I realize that deep_gcns_torch may be not a friendly repo to me and my GPUs ...hhhhh By the way, is there a gap on time efficiency between the tensorflow version and pytorch version of DeepGCNs ? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#20 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AFUJTYF5CPSZN5DW2DI66HTRJTHJHANCNFSM4LVA5HDQ> .

Answer 5 · 2020-03-27T16:55:50.000Z

Thank you very much.

Answer 6 · 2020-05-26T14:11:06.000Z

Hi @lightaime

I've found there are 2 ways for sparse KNN neighborhood calculation when running seg_sem_sparse: knn_graph_matrix and knn_graph.

The knn_graph_matrix is based on pairwise distance matrix but it assumes the input have same number of nodes in each graph (batch mode). However, the knn_graph is coming from torch_cluster and can support different number of nodes in each graph (batch mode).

Please correct me if there is anything wrong or missed.