Issue when running the pipeline for anonymization using x-vectors and neural waveform models

After what seems like a successful installation of the software using the ./install.sh script. I encountered an error running the ./run.sh script:

Stage a.1: Generating pseudo-speakers for libri_dev_enrolls.[0m
Computing PLDA affinity scores of each source speaker to each pool speaker.
cut: exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libritts_train_other_500/spk_xvector.scp: No such file or directory
bash: line 1: 8013 Aborted (core dumped) ( ivector-plda-scoring --normalize-length=true "ivector-copy-plda --smoothing=0.0 exp/models/2_xvect_extr/exp/xvector_nnet_1a/plda - |" "ark:ivector-subtract-global-mean exp/models/2_xvect_extr/exp/xvector_nnet_1a/mean.vec scp:exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/spk_xvector.scp ark:- | transform-vec exp/models/2_xvect_extr/exp/xvector_nnet_1a/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" "ark:ivector-subtract-global-mean exp/models/2_xvect_extr/exp/xvector_nnet_1a/mean.vec scp:exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libritts_train_other_500/spk_xvector.scp ark:- | transform-vec exp/models/2_xvect_extr/exp/xvector_nnet_1a/transform.mat ark:- ark:- | ivector-normalize-length ark:- ark:- |" "cat 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/fake_trials/trial_1272' | cut -d\ --fields=1,2 |" exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/spk_pool_scores/affinity_1272 ) 2>> exp/scores/log/libritts_pool_scoring.log >> exp/scores/log/libritts_pool_scoring.log
run.pl: job failed, log is in exp/scores/log/libritts_pool_scoring.log
['local/anon/gen_pseudo_xvecs.py', 'data/libri_dev_enrolls', 'data/libritts_train_other_500', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/spk_pool_scores', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon', 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libri_dev_enrolls/pseudo_xvecs', 'spk', 'false', 'farthest', '0']
Same gender speakers will be selected.
Randomization level: spk
Proximity: farthest
Reading source spk2gender.
Reading source spk2utt.
Reading pool spk2gender.
Reading pool xvectors.
Traceback (most recent call last):
File "local/anon/gen_pseudo_xvecs.py", line 85, in
for key, xvec in reader:
File ".../Voice-Privacy-Challenge-2020/venv/lib/python3.8/site-packages/kaldiio/highlevel.py", line 128, in iter
k, v = next(self.generator)
File ".../Voice-Privacy-Challenge-2020/venv/lib/python3.8/site-packages/kaldiio/matio.py", line 78, in load_scp_sequential
with open_like_kaldi(fname, 'r') as fd:
File ".../Voice-Privacy-Challenge-2020/venv/lib/python3.8/site-packages/kaldiio/utils.py", line 205, in open_like_kaldi
return io.open(name, mode, encoding=encoding)
FileNotFoundError: [Errno 2] No such file or directory: 'exp/models/2_xvect_extr/exp/xvector_nnet_1a/anon/xvectors_libritts_train_other_500/spk_xvector.scp'

apparently the spk_xvector.scp file could not be created/found.

The full log file can be found here
https://drive.google.com/file/d/1fMagP7K-6YOieSFpvPVn8dTr8x7fLZli/view?usp=sharing

the log file generated by Kaldi exp/scores/log/libritts_pool_scoring.log can be found here:

https://drive.google.com/file/d/1TmMzlBY-P9pZ8SuOzcjyeEXKV5Krkdjh/view?usp=sharing

Hi @ArneDefauw,

It seems that in Stage 7, running /sid/nnet3/xvector/extract_xvectors.sh was not completed.

In your log:

sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors for data/libritts_train_other_500
sid/nnet3/xvector/extract_xvectors.sh: extracting xvectors from nnet
Stage 8: Making evaluation subsets...

This corresponds to Stage 0 in extract_xvectors.sh.
However, in extract_xvectors.sh there are two other stages (1 and 2) that are not seen in your log file.

Could you please attach the log of extract_xvectors.sh for
(Stage 7, exp/models/2_xvect_extr/exp/xvector_nnet_1a, data/libritts_train_other_500)?

Hi @Natalia-T , thanks for the swift reply.
I ran the /sid/nnet3/xvector/extract_xvectors.sh script as standalone, and now it is going through all stages https://drive.google.com/file/d/1fCs1jSvLVg224l0HM0pddPqntuTIlXNe/view?usp=sharing .

However now it is failing in another stage, due to problems with cuda:
https://drive.google.com/file/d/1Cx_pLtAEj4Knroin32IuePj5JewokYzE/view?usp=sharing
https://drive.google.com/file/d/1Ho26Nwz3BUYwsjk4lMBJGPR9LaJsFJWj/view?usp=sharing

Is it necessary to run the code on GPU?

Hi @ArneDefauw,

The program fails on the PPG (BN) feature extraction because by default: use_gpu=yes in

Voice-Privacy-Challenge-2020/baseline/local/featex/extract_bn.sh

Line 14 in 1dcdcda

use_gpu=yes

This stage can be performed on CPU. To do this you should provide the corresponding value into the call of extract_bn.sh:

Voice-Privacy-Challenge-2020/baseline/local/featex/extract_ppg.sh

Line 50 in 1dcdcda

local/featex/extract_bn.sh --cmd "$train_cmd" --nj 1 \

by specifying: --use_gpu no.

However, for some other (later) stages (i.e. TTS part), GPU is necessary.