NVIDIA/mellotron

require large memory

aijianiula0601 opened this issue · 3 comments

Thanks for your jobs!

my environment:
V100,8gpu,per gpu 32g memoy.

I train with 4 gpu use multiproc and it take away almost 32g each gpu.The batch_szie is set to 32 instead of 4*32=128. Does it need that much memory? Why not switch to dataparallel.Thank you!

@aijianiula0601
VRAM (GPGPUs) or System Memory (RAM)?


https://github.com/NVIDIA/mellotron/blob/master/distributed.py#L51

The multiproc is just a slightly modified version of Dataparallel, memory usage should be the same (or very close) to pytorch's built-in copy.

@aijianiula0601
VRAM (GPGPUs) or System Memory (RAM)?

https://github.com/NVIDIA/mellotron/blob/master/distributed.py#L51

The multiproc is just a slightly modified version of Dataparallel, memory usage should be the same (or very close) to pytorch's built-in copy.

I change it to Dataparallel.The batch size can set to 128, but train slow.

With our implementation, try decreasing the batch size.