Xflick/EEND_PyTorch

When I run run.sh, I encountered a problem

Closed this issue · 5 comments

First of all thank you for open source this code
There is no folder named exp_large, and some files in this directory such as avg.th or transformer.th are not found, can you provide these files? or retraining does not require these files, then how should I pass in these files?.
And can the adapt stage be skipped?

Hi, exp_large will be automatically created during training phase. Model files such as avg.th and transformer*.th are supposed to be stored during training and are not needed for retraining. Or you may just want to perform the inference or adaptation stage without retraining the whole network, I offered my pretrained models in this repo.

then how should I pass in these files?

I didn't quite get you here. Training the network does not require passing avg.th or transformer*.th. For adaptation, the model is passed through init_model.

python eend/bin/train.py -c $adapt_conf $train_adapt_dir $dev_adapt_dir $model_adapt_dir --initmodel $init_model

For testing, the model is passed through test_model.
python eend/bin/infer.py -c $infer_conf $test_dir $test_model $infer_out_dir

The adaptation stage can be skipped, but there is a huge performance degradation if you simply apply your model trained on simulation data to other real scenarios (such as callhome, etc.).

Thank you for your quick reply. I have another question. Does this part of the code support the diarization of multiple speakers(unknown number of speakers)? When I try to train with the AMI dataset, the following error will be reported

`[ INFO : 2020-11-05 09:22:40,290 ] - namespace(batchsize=64, config=[<yamlargparse.Path object at 0x7f1d61e35590>], context_size=7, frame_shift=80, frame_size=200, gpu=1, gradclip=5, gradient_accumulation_steps=1, hidden_size=256, initmodel='', input_transform='logmel23_mn', label_delay=0, lr=1.0, max_epochs=100, model_save_dir='exp_large/models', model_type='Transformer', noam_warmup_steps=100000.0, num_frames=500, num_speakers=2, optimizer='noam', resume='', sampling_rate=16000, seed=777, subsampling=10, train_data_dir='/home/tp/projects/kaldi_style_data/train', transformer_encoder_dropout=0.1, transformer_encoder_n_heads=4, transformer_encoder_n_layers=4, valid_data_dir='/home/tp/projects/kaldi_style_data/dev')
10095 chunks
2086 chunks
Traceback (most recent call last):
File "eend/bin/train.py", line 63, in
train(args)
File "/home/tp/projects/EEND_PyTorch-master/eend/pytorch_backend/train.py", line 68, in train
Y, T = next(iter(train_set))
StopIteration
Start model averaging
Namespace(ifiles=['exp_large/models/transformer91.th', 'exp_large/models/transformer92.th', 'exp_large/models/transformer93.th', 'exp_large/models/transformer94.th', 'exp_large/models/transformer95.th', 'exp_large/models/transformer96.th', 'exp_large/models/transformer97.th', 'exp_large/models/transformer98.th', 'exp_large/models/transformer99.th', 'exp_large/models/transformer100.th'], ofile='/home/tp/projects/EEND_PyTorch-master/pretrained_models/large/model_callhome.th')
Traceback (most recent call last):
File "eend/bin/model_averaging.py", line 35, in
average_model(args.ifiles, args.ofile)
File "eend/bin/model_averaging.py", line 18, in average_model
tmpmodel = torch.load(ifile)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 215, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: 'exp_large/models/transformer91.th'
Start adapting
Traceback (most recent call last):
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1787, in parse_known_args
namespace, args = self._parse_known_args(args, namespace)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1993, in _parse_known_args
start_index = consume_optional(start_index)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1923, in consume_optional
arg_count = match_argument(action, selected_patterns)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 2088, in _match_argument
raise ArgumentError(action, msg)
argparse.ArgumentError: argument -c/--config: expected one argument

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "eend/bin/train.py", line 57, in
args = parser.parse_args()
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/yamlargparse.py", line 158, in parse_args
cfg = super().parse_args(args=args, namespace=namespace)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1755, in parse_args
args, argv = self.parse_known_args(args, namespace)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/argparse.py", line 1794, in parse_known_args
self.error(str(err))
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/yamlargparse.py", line 671, in error
raise ParserError(message)
yamlargparse.ParserError: argument -c/--config: expected one argument
Start model averaging
Namespace(ifiles=['/transformer91.th', '/transformer92.th', '/transformer93.th', '/transformer94.th', '/transformer95.th', '/transformer96.th', '/transformer97.th', '/transformer98.th', '/transformer99.th', '/transformer100.th'], ofile='/home/tp/projects/EEND_PyTorch-master/pretrained_models/large/model_callhome.th')
Traceback (most recent call last):
File "eend/bin/model_averaging.py", line 35, in
average_model(args.ifiles, args.ofile)
File "eend/bin/model_averaging.py", line 18, in average_model
tmpmodel = torch.load(ifile)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 584, in load
with _open_file_like(f, 'rb') as opened_file:
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 234, in _open_file_like
return _open_file(name_or_buffer, mode)
File "/home/tp/anaconda3/envs/pyaudio/lib/python3.7/site-packages/torch/serialization.py", line 215, in init
super(_open_file, self).init(open(name, mode))
FileNotFoundError: [Errno 2] No such file or directory: '/transformer91.th'`

if i want train the model on the simulation data(such as mini_librispeech) and adapt on thr AMI dataset, How do I set the simu_opts_num_speaker parameter ,and num_speakers in adapt.yaml should be set to the maximum number of speakers in AMI?

Looking forward to your reply, thanks

Hi, traditional EEND only supports fixed number of speakers. It is possible to set the training speaker number to match the maximum number of speakers in your test set, but the training process will become really slow (I personally found speaker num > 3's training time unacceptable).

From what you have described, I recommend you to read their latest works, which support variable number of speakers.

Neural Speaker Diarization with Speaker-Wise Chain Rule
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

Hi, traditional EEND only supports fixed number of speakers. It is possible to set the training speaker number to match the maximum number of speakers in your test set, but the training process will become really slow (I personally found speaker num > 3's training time unacceptable).

From what you have described, I recommend you to read their latest works, which support variable number of speakers.

Neural Speaker Diarization with Speaker-Wise Chain Rule
End-to-End Speaker Diarization for an Unknown Number of Speakers with Encoder-Decoder Based Attractors

Thank you very much for your guidance, I will close this issue,

首先感谢你开源这个代码。 没有名为exp_large的文件夹,并且这个目录中找不到某些文件如_avg.th_或_transformer.th_,你能提供这些文件吗?或者重新训练不需要这些文件,那么我应该如何格式化这些文件?并且可以克服适应阶段吗?

请问你解决了嘛,我也遇到同样的问题