dhlab-epfl/dhSegment

Taking too much time in training

CS-savvy opened this issue · 3 comments

Model taking 2 hrs for one epoch having 2300 images and batch size is 1, but you guys have mentioned it took only 4 hrs to train page detection model which contains 1600 images for 30 epochs.

can someone tell me the reason?

Can you give your GPU specs ?

Thanks for replying @solivr
I am using Azure VM to train dhSegment model having NVIDIA-driver 390.116 GPU - Tesla K80 - 11 Gb gpu with 6 vcpu and 56 GB ram.

https://www.techpowerup.com/gpu-specs/tesla-k80.c2616

GPU -
gpu_config

CPU -
cpu_config

@solivr @CS-savvy are there any updates here?? My instances are crashing due to the fact of memory even though I use 4 GPUs of 16 GB each. Can anyone suggest any improvement??