mravanelli/pytorch-kaldi

Training on multi-gpu very slow

sun-peach opened this issue · 4 comments

I am training my ASR model with pytorch-kaldia, and notice the training time is very slow, 10% of 1 chunk is 10 mins. I have 10 chunk and will run to 15 epochs, which leads to about 10 days training.

My dataset has about 2k hours audio, and I split them in 10 chunks. I use multi-gpu, my GPU memory is 32G. I am following cfg/librispeech_liGRU_fmllr.cfg, except I use Adam instead and 4 liGRU layers (instead of the 5 layers set originally).

I have searched in the "Issues" and learnt that the developers have already optimized the multi-GPU training process. But I still see my GPU utils is around 30%, which means not fully used. I would like to know is there anyway that I can speed up the training a little bit?

Thank you very much!

Hi ! So this is quite a hard problem in itself. 2K hours is a lot, and 10 days of training on a Single GPU sounds reasonable to me. You can 1: Consider something else than LiGRU (LSTM and GRU are faster thanks to CUDNN, but they also give worse performances). 2. Multi-GPU with DataParallel is bottlenecked by Python, and the only solution is to go with DistributedDataParallel (Which is impossible to adapt for pytorch-Kaldi I think). So you should just do mutligpu=true and then do batch_size = max_batch_size_for_one_gpu * number_of_gpu_you_got. Training time doesn't scale linearly with the number of GPU but you can easily go down to 3 days with 4 GPUs.

Thank you. I use the setting listed below:

use_cuda=True
multi_gpu=True
N_epochs_tr=15
N_chunks=50
batch_size_train=16
max_seq_length_train=1500
increase_seq_length_train=True
start_seq_len_train=300
multply_factor_seq_len_train=5
batch_size_valid=8
max_seq_length_valid=1400

It seems that it will take about 12 days. (My sequence length is long). If you think all my setting is reasonable, then I will just wait.

How many GPUs do you have ?

4GPUs.