yongxuUSTC/DNN-for-speech-enhancement

pfile tail is Not correct.

xjia520 opened this issue · 12 comments

Hi, Dr. Xu, I meet a problem when I used the training demo. Would you please give me some help?
According to the "README.md", I used the Quicknet tool to generate pfile, including "fea.pfile", "targ.pfile" and "fea.norm". And I only used two pair of wavs to test like this:

feacalc sa1.wav sa2.wav -output targ.pfile
feacalc sa1_car_snr1.wav sa2_car_snr1.wav -output fea.pfile
pfile_norm -i fea.pfile -o fea.norm

And then I used these three files to config the "finetune_DNN_speech_enhancement_dropout_NAT_linux.pl" and run in linux. But all the log file show the message "pfile tail is Not correct." .
Is it wrong the way I prepared the pfiles? How can I prepare my pfiles? and I don't quite understand tips in "how_to_get_pfile.txt" and there are two files ".len" and ".scp" should be prepared first. What the exactly format of pfiles? Would you please give a small example?
I am very grateful.

Hi, you should first use HTK toolset (or the tool in my decoding tool) to extract log-power spectra (not wav) feature, and then use quicknet tool to merge into big pfile. ".len" means the number of frames for each feature file. ".scp" is the path of each feature file. You can read my decoding MATLAB code, maybe it can help you to understand how to extract log-power spectra feature.

OK. I will go to learn your decoding MATLAB code first. Thank you for your help!

@yongxuUSTC
Hi, Dr. Xu, I have already learnt your decoding MATLAB code, and there are still some problems. Can you give me some advices.
I used the tool "Wav2LogSpec.exe" to extract all ".lsp" files, and I prepared ".len" and ".scp" file according to your suggestion. The length of each ".lsp" file is caculated through the corresponding ".wav" file and frame length is 16ms. The ".scp" file contains all the path of ".lsp" files with the same order of ".len" file. I think this step is OK but I have no ways to confirm it.

Then I changed the suffix of the file "how_to_get_pfile.txt" to ".pl", and configured it. Is this step OK? I treated that you have already tested this script "how_to_get_pfile.pl", and it can run well.

Then run the script and Error return like below:

xjia@server02:~/SEDNN/pfile_extract$ perl how_to_get_pfile.pl
ok
fopen_gz: couldn't fopen('/home/xjia/SEDNN/pfile_extract/timit_10_sentences_lsp.scp2','r')
feacat: Error: Unable to input data list file '/home/xjia/SEDNN/pfile_extract/timit_10_sentences_lsp.scp2'
fopen_gz: couldn't fopen('/home/xjia/SEDNN/pfile_extract/timit_10_sentences_lsp.scp3','r')
feacat: Error: Unable to input data list file '/home/xjia/SEDNN/pfile_extract/timit_10_sentences_lsp.scp3'

What's wrong with my steps and how can I use the QuickNet tool to get the right pfiles? Would you please give me more help? I think extracting the features is really too complicated. Can you give a small example? I'm really long for it. Thank you VERY MUCH!

Hi i update the detailed steps to get pfile and add some toolset:

https://github.com/yongxuUSTC/DNN-for-speech-enhancement/blob/master/how_to_get_pfile.txt

detailed steps for get pfile

step1: used the tool "Wav2LogSpec.exe" to extract all ".lsp" files from raw audio files. (RAW means no header, there is header info in wav format audio files, help/wav2raw.exe can delete the header info)

step2: use "toolbox/le2be.m" to convert the little endian (le) ".lsp" features into big endian (be) ".lsp_be" features. Here it is a little tricky, maybe you can find some better ways.

step3: use "toolbox/randomlist.pl" to rand your scp lists

step4: prepare ".len" TXT file (the frame number of each ".lsp" file, one number on each line). You can use #"toolbox/get_len_from_scp.pl"

#".len" example:
#120
#234
#451
#99
#...

step5: use the ".scp (be format feature files, not le format feature files)" and ".len" to get pfile as following.

summary: "Wav2LogSpec.exe" only can extract "little endian (le)" format feautres from RAW audio files. But quicknet tools only accept "big endian (be)" feature files. So that is why you need "le2be". Normally, HTK toolset (like HCopy.exe) can directly extract "be" formart features (log-power spectra), but i never try it.

summary: here it is a little tricky and ugly, i am planing to re-write all of the DNN-SE stuff based on Tensorflow.

Hi, Dr. Xu. I am not very clear about the step4: prepare ".len" TXT file .The "toolbox/get_len_from_scp.pl" file seems only can read the "*.len "file. I want to know how to get the ".len" file. Could you give me some help?
Thx.

Now I prepared ".len" and ".scp" file. The".scp" file is get following detailed steps(step1-step3) you provided. The length of each ".lsp" file is calculated like this
[y,Fs,bits] = wavread('test.wav','size')
Tm=y(1)/Fs;
Len=Tm/0.016;
The calculated Len is 226.8008.
Then run the script and Error return like below:
QuickNet(QN_InFtrStream_HTK.feacat): WARNING - Sample format in HTK file (unknown) is unknown (0).
QuickNet(QN_InFtrStream_HTK.feacat): ERROR - EOF reading 1 frms at frame 225 in seg 0 of file (unknown).

Would you please give me more help?

You should first understand "frame" concept in speech processing.
your method to get ".len" file is wrong. ".len" stores the frame number info for each fea file. you can get ".len" by "toolbox/getlenscp.exe in.scp out.len" or use "readhtk.m" to read the number of frame info.

the feature is always stored in (num_of_frame, fea_dim), you can see these info using "readhtk.m"

wav2logspec.exe can extract the feature, but it is in "le" format, you should convert it into "be" format. And then use feacat to combine fea and len. Note that ".len" file should be in unix format (not windows format).

ok. thank you very much. I will try it.

Hi Dr Xu. I tried to use one wave file to prepare the .pfile following the steps as below:
In windows system:
Step1: use wav2raw.exe to delete the header info from .wave file .
Step2: use wav2logSpec.exe to extract the ".lsp" files from raw audio files.
( I confirm the above two steps step via comparing the generated .lsp file with that one generated using
step1_DNNenh_for16kHz.m.
I also confirm the generated generated .raw file with the original .wave .)
Step3: use step1_DNNenh_for16kHz.m to convert the little endian. I confirm this step via checking the generated file.
Step4: use GetLenScp.exe to get the .len file., and got a number in this file.
In linux system :
Step 5: use dos2unix command to convert the .len file .
Step 6: use feacat -period 16.0 -ipformat htk -deslenfile .len -lists -o .scp out.pfile.

But i still got the error:
ERROR - EOF reading 1 frms at frame 225 in seg 0 of file (unknown).

I really do not know where it is wrong. Could you give me some advice?

Hi, Dr. Xu, It's so kind of you to give detailed steps to help me extract feature files. Thank you very much!

Now, I can prepare all the ".lsp" files(big endian), ".len" file and ".scp"file with the tools you offered, and they're all in unix format, but I found that when run the "how_to_get_pfile.pl" script, an error is still appear like below:

fopen_gz: couldn't fopen('/home/xjia/SEDNN/pfile_extract/timit_10sentences_lsp_16k_clean_be_random_unix.scp4','r')
feacat: Error: Unable to input data list file '/home/xjia/SEDNN/pfile_extract/timit_10sentences_lsp_16k_clean_be_random_unix.scp4'
fopen_gz: couldn't fopen('/home/xjia/SEDNN/pfile_extract/timit_10sentences_lsp_16k_clean_be_random_unix.scp2','r')
feacat: Error: Unable to input data list file '/home/xjia/SEDNN/pfile_extract/timit_10sentences_lsp_16k_clean_be_random_unix.scp2'

From the error message, it seems that the below line in the "how_to_get_pfile.pl" script can NOT run well:
system("feacat -period 16.0 -ipformat htk -deslenfile $len_scp$i -lists -o $fea_tr$i $fea_scp$i");### frame shift = 16ms, attention here

Since I'm not very familiar with QuickNet tool, I don't know what the exactly problem is and want to request your help and support. All the error messages show that can't open file ".scp4" or other numbers but in fact the prepared file just ".scp" and have no suffix. I don't understand it, and would you please give me some advice?
Thank you!

Do you give the full path of each feature file?
Sorry, i think it is Lunix basic problems, try to use google to search the answers. Or you can dig into the source code of pfile tools. They are open source tools.
I may not have lots of time to help you to solve each problem you come across.