zgcr/SimpleAICV_pytorch_training_examples

When using Dataparrel, GPU footprint will increase with time

jlhou opened this issue · 2 comments

jlhou commented

Thank you very much for your outstanding work. However, when I use Dataparrel for training, the GPU will take up more and more time, and then CUDA will start stop the program. May I ask why?

jlhou commented
zgcr commented

The experiments is: imagenet_experiments/resnet_imagenet_DataParallel_train_example The error detail is:cuda runtime error (719): unspecified launch failure at /opt/conda/conda-bld/pytorch_1570910687230/work/aten/src/THC/generic/THCTensorMath.cu:26 Thank you very much! | | hjl | | jlhou13279338078@163.com | 签名由网易邮箱大师定制 On 11/25/2020 16:54,zgcrnotifications@github.com wrote: hi,could you tell me which experiment you were doing with the problem? Can you give me more details about error message? Thank you very much for your outstanding work. However, when I use Dataparrel for training, the GPU will take up more and more time, and then CUDA will start stop the program. May I ask why? Hi,could you tell me which experiment you were doing with the problem? Can you give me more details about error message? — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

I just tried this experiment, but this error didn‘t appear. Is your all python packages and CUDA installed correctly?