yoonsanghyu/Dual-Path-Transformer-Network-PyTorch

Hi !, how much time needed to train a batch for your trainnig ?

Closed this issue · 1 comments

hi, thank you for your code
I want to know how much time your train for a batch and what' s the whole time spent?
I feel a batch maybe need so long time in my training on the V100s

ufee:19773:20233 [0] misc/ibvwrap.cc:63 NCCL WARN Failed to open libibverbs.so[.1]
lufee:19773:20233 [0] NCCL INFO NET/Socket : Using [0]lo:127.0.0.1<0> [1]eth0:172.17.0.5<0>
NCCL version 2.4.8+cuda10.0
lufee:19773:20233 [0] NCCL INFO nranks 3
lufee:19773:20233 [0] NCCL INFO Setting affinity for GPU 0 to 03ff,f0003fff
lufee:19773:20233 [2] NCCL INFO Using 256 threads, Min Comp Cap 7, Trees disabled
lufee:19773:20233 [2] NCCL INFO Channel 00 :    0   1   2
lufee:19773:20233 [0] NCCL INFO Ring 00 : 0[0] -> 1[1] via direct shared memory
lufee:19773:20233 [1] NCCL INFO Ring 00 : 1[3] -> 2[4] via P2P/direct pointer
lufee:19773:20233 [2] NCCL INFO Ring 00 : 2[2] -> 0[0] via direct shared memory
lufee:19773:20233 [0] NCCL INFO Launch mode Group/CGMD
Epoch 1 | Iter 1 | Average Loss 6.943 | Current Loss 6.943422 | 19712.5 ms/batch

A it's my fault
I forget to change the arg print_freq