Refactoring

Question

Refactoring

Closed this issue 4 years ago · 21 comments

Hi, I am trying to make a Dockerfile and refactor directory structures.

In my branch docker, I have finished a dockerfile based on kaldi's official gpu-latest image. The pytorch is at 1.2.0 version, because the base image, kaldiasr/kaldi:gpu-latest, is staying at CUDA10.0. I'm not familiar with pytorch, so I just copy the instrunction from official site here.

Update: I chose pytorch 1.5 as base image now.

After those works, I found out that the scripts in wsj/steps and utils are not following kaldi's guidline.
Some of them were copied from old version of kaldi, Eesen, etc. I would suggest untracking typeical scripts from old Kaldi and moving remains into scripts/ctc-crf. Every egs has two soft links, steps and utils, to kaldi's wsj, and one soft link goes to scripts/ctc-crf .

Do you agree with above changes? I'm happy to make PR.

Answer 1 · 2020-05-11T05:26:01.000Z

Thanks for your contribution to our work. The structures seems clearer now. Have you test the scripts (In particular, the dependencies between scripts) to make sure they can work?

Answer 2 · 2020-05-11T05:35:02.000Z

And the installation guidance in readme.md need to be modified. It may be a good idea to provide optional choices for somebody who already have kaldi and pytorch installed.

Answer 3 · 2020-05-11T08:17:02.000Z

It is just work in process now. I will update aishell folder, them maybe you can follow new scripts to update wsj, and other LDC corpus? I have no membership of LDC, and no powerful machines.

Answer 4 · 2020-05-11T09:15:12.000Z

Thank you and I'll try it.

Answer 5 · 2020-05-18T05:34:02.000Z

Hi @aky15,
It the src/batchnorm_src out of the date? Can it be replaced with something likes torch.nn.BatchNorm1D/2D?

Answer 6 · 2020-05-18T05:37:01.000Z

BTW, I'm running the aishell egs with single gpu. It will take me few days to get WER results.

Answer 7 · 2020-05-18T12:24:53.000Z

BatchNorm in torch can not handle input with varying lengths, which is very common in ASR. Specifically, short utterance in the batch are padded with zero, and these zeros should not be included in the mean and variance calculation. src/batchnorm_src is meant for this.

Answer 8 · 2020-05-21T14:23:09.000Z

Bother you again.

I got initial result on aishell finally. I have made two changes to make it work on my pc.

Batch size was reduced to 100, because I have only one GPU with 12GB ram.
The "batchnorm_src" was replaced with built-in BatchNorm1D. I cannot find setup script for pytorch 1.x cpp extension likes ctc-crf/setup_1_0.py.

I kill the training process after epoch 17. Here is progress report

$ grep mean_cv_loss  my.6.log     
mean_cv_loss: -15.59308932104776
mean_cv_loss: -16.70415916442871
mean_cv_loss: -17.101018417713252
mean_cv_loss: -17.341455113610557
mean_cv_loss: -17.45301233335983
mean_cv_loss: -17.57776860969011
mean_cv_loss: -17.65706866064737
mean_cv_loss: -17.663358138328373
mean_cv_loss: -17.653572401889534
mean_cv_loss: -17.716543197631836
mean_cv_loss: -17.751277187258697
mean_cv_loss: -17.78889011116915
mean_cv_loss: -17.68128953090934
mean_cv_loss: -17.738837211076603
mean_cv_loss: -17.70960905385572
mean_cv_loss: -18.05926601498626
mean_cv_loss: -18.084275427529978

And the CER result

$ grep WER exp/decode_test/lattice/cer_* | utils/best_wer.sh 
%WER 6.03 [ 18937 / 314295, 481 ins, 844 del, 17612 sub ] exp/decode_test/lattice/cer_9

It looks really good!

Do you have the batchnorm_src setup script for python1.x? I am not familiar with pytorch.

Answer 9 · 2020-05-21T16:04:37.000Z

Hi, Alex, happy to see that you get really good results.
We will implement batchnorm_src setup for python1.x later. A possible alternative is layer norm along the feature dimension, which will not be affected by the length of the utterance.

Answer 10 · 2020-05-22T00:09:33.000Z

I know many tricks to avoid the zero padding issue. However, I want to share this work to researchers in Taiwan, so I wish it can reproduce your research works with meaningful math as more than just another deep learning tool.

Answer 11 · 2020-05-22T04:41:12.000Z

Hi Alex, Nice to see you got the really nice result on AISHELL.
What do you mean by saying "reproduce your research works with meaningful math"?

Answer 12 · 2020-05-22T06:46:05.000Z

没想到教授会亲自上线，我就不用破英文荼毒大家了。
我目前在华硕工作，我的同事对您的研究很感兴趣，但建制实验环境遇到不少问题，所以我打算把这份程序整理好并且分享给更多在**的学者。现在有很多研究都会忽略细节，就像zero padding的问题，大部分的人会选择先根据长度排序，再进行mini-batch训练。依aky15的说明，batchnorm_src似乎是从根本上解决这问题，我觉得很值得保留下来。所以才希望aky15能提供pytorch 1.x版的安装方式。
日后如果**有举办活动，例如竞赛，就可以用您的研究作为一个baseline范例，让更多人使用。

Answer 13 · 2020-05-23T11:26:25.000Z

@aky15 我會嘗試這個方法
https://gist.github.com/amiasato/902fc14afa37a7537386f7b0c5537741#file-maskedbatchnorm1d-py

大約再三天會有結果。

Answer 14 · 2020-05-23T14:25:09.000Z

@alex-ht I see. Thanks.
@aky15 Before implementing batchnorm_src setup for python1.x, you may check if the URL mentioned by alex-ht is already a correct solution.

Answer 15 · 2020-05-28T01:36:33.000Z

Hi,替换后结果是CER6.10%，一样第17 epoch后停止，相对变化只有1.1%，我认为几乎是在误差范围。

另外，我发现预设情况是采用6层BLSTM的设定，并没有受batchnorm_src的影响🤣。
我会在train.py加个参数选项选择不同DNN架构，然后继续测试看看。

Answer 16 · 2020-06-16T04:09:40.000Z

Hi @ZhijianOu @aky15 ，
请问@aky15有确定maskedbatchnorm1d-py是否正确？我自己看是觉得无误，只有最后一行不确定return inp和return inp, lengths是否有影响。
另外我一直无法在pytorch0.4.1的版本正确编译batchnorm_src的程式，无法做出1.5, 0.4两个版本供人选择。

因为**这即将举行**闽南语的ASR竞赛，所以来询问一下状况。为了赶上时间，我打算先用pytorch1.5的环境分享给**的ASR社群。如果教授不介意，我会用自己的fork先分享一次，再附上教授网站的连结。

Answer 17 · 2020-06-16T06:20:29.000Z

@alex-ht Thanks for the message.
I think, @aky15 can reply the questions regarding

implementing batchnorm_src setup for python1.x
check if maskedbatchnorm1d-py is a correct solution

I suggest you make a PR, named refactor. You may mainly do well with aishell, I will let more others to update other egs, and then do merge. Then we can work with the refactor in the future.

When is your expected time to release for use in the challenge?

Answer 18 · 2020-06-16T06:41:08.000Z

Ok, 我把librispeech和台灣的formosa語料的script整理好後會先發PR

從目前的時間表來看，九月前完成會比較好

2020/07/01 --- Training Data Release
2020/09/01 --- Pilot-Test (dry-run only) Data Release
2020/09/15 --- Pilot-Test (dry-run only) Result Submission
2020/09/30 --- Pilot-Test (dry-run only) Performance Notification

Answer 19 · 2020-06-21T04:34:09.000Z

@alex-ht Hi,Alex. In our implementation, batchnorm_src is built by default with torch 0.4* complied from source. It's a bit more complicated and we met some problems to add support for its torch1.* version. We found in some datasets that layernorm yields comparable (even better) results and is more natural to solve the zero padding problem since other samples in the batch do not involved in the mean/var computation. We have been testing the performance of layernorm throughly and recommend to use it in our neural network model.

Answer 20 · 2020-06-21T11:29:06.000Z

Got it with thanks, @aky15 !

Answer 21 · 2020-06-21T11:33:09.000Z

另外，我无法在缩减batch size情况下让librispeech的训练收敛。大约在两三天我会发PR，再请你们试试看aishell和librispeech的script。