manojpamk/pytorch_xvectors

which mfcc.conf do you use?

Closed this issue · 6 comments

hi,
First of all thank you for your fantastic project!
I have no idea which mfcc.conf file used in your project, the conf/mfcc.conf is not in your project.
Hope your soon reply.
thanks!

Hi,

mfcc.conf is from the voxceleb recipe in kaldi. I forgot to add the symlink from pytorch_run.sh. It should be added after the recent commit.

Best,
Manoj

Hi,
Thanks for your quick reply!
I found when I set the num_workers=2 at DataLoader in train_xent.py, it will throw an error as follow:
Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 333, in reduce_storage RuntimeError: unable to open shared memory object </torch_20266_1801196515> in read-write mode Traceback (most recent call last): File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/queues.py", line 234, in _feed File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 51, in dumps File "/home/lcf/anaconda3/envs/python36/lib/python3.6/site-packages/torch/multiprocessing/reductions.py", line 337, in reduce_storage File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/reduction.py", line 191, in DupFd File "/home/lcf/anaconda3/envs/python36/lib/python3.6/multiprocessing/resource_sharer.py", line 48, in __init__ OSError: [Errno 24] Too many open files

Is it not support multi-processing in loading data?

I faced the same issue, and unfortunately I couldn't find a solution. From what I understood: since we are serially accessing a egs.*.scp, it is not possible to read them with num_workers > 0.

Let me know if you find a workaround!

Best,
Manoj

I found it can be solved by add the code below at train_xent.py

import torch.multiprocessing as mp
mp.set_sharing_strategy('file_system')

but it can face a new problem. When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader: and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time)) and print('Validation accuracy is %1.2f precent' %(valAcc)).

The reason why I want to use multiple processes to load data is that I found these code has very low GPU utilization and is slow to train.
Finally I found that the main reason for affecting the training speed is not the problem of data loading, but the problem of model definition.

In the models.py script, the following code is very time-consuming:

if self.training:
    x = x + torch.randn(x.size()).to(self.device)*eps

After I modify it as below, the training speed is much faster

        if self.training:
            #x = x + torch.randn(x.size()).to(self.device)*eps
            shape = x.size()
            noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
            torch.randn(shape, out=noise)
            x += noise*eps

When I set num_workers=32, it can work, but the code always executes the for loop part for _, (X, Y) in par_data_loader: and can't executes print('Archive processing time: %1.3f' %(time.time()-archive_start_time)) and print('Validation accuracy is %1.2f precent' %(valAcc))

Does the logging statement if batchI-loggedBatch >= args.logStepSize: execute?

After I modify it as below, the training speed is much faster
if self.training:
#x = x + torch.randn(x.size()).to(self.device)eps
shape = x.size()
noise = torch.cuda.FloatTensor(shape) if torch.cuda.is_available() else torch.FloatTensor(shape)
torch.randn(shape, out=noise)
x += noise
eps

I can confirm the speedup! Do you want to create a PR?

The if batchI-loggedBatch >= args.logStepSize: is executed, and the batchI will larger than numBatchsPerArk but can't break the for loop mention above.

yes, I will create a PR, thanks!