Can't create vocabulary

Question

Can't create vocabulary

Opened this issue 7 years ago · 6 comments

I had already tried a lot of time to use parameters and some tested files. However, It showed me some error.

Here is the error showed on the screen:
grammar_tools.cpp (403) : generate | Something went wrong while creating vocabulary!

PS: test.wav is just a white noise and test.srt just contains some tested subtitles.
test.zip

Answer 1 · 2017-12-12T02:39:36.000Z

The parameters are "--wav test.wav --srt test.srt".

Answer 2 · 2017-12-12T02:49:20.000Z

Could you give us the output before the error?
Also, please check if you have all dependencies installed.

Answer 3 · 2017-12-14T02:56:56.000Z

/usr/bin/valgrind --tool=memcheck --xml=yes --xml-file=/tmp/valgrind --gen-suppressions=all --leak-check=full --leak-resolution=med --track-origins=yes /home/retrospect/Documents/code-in/CCAligner/src/cmake-build-debug/ccaligner -wav test.wav -srt test.srt

/ / | / \ | () __ _ _ __ ___ _ __
| | | | / _ \ | | |/ ` | ' \ / _ \ '|
| || |__ / ___ | | | (| | | | | / |
__// __||_, || ||_||
|/

CCAligner 0.03 Alpha [Shubham]
Word by Word Audio-Subtitle Synchronization
Saurabh Shrivastava | saurabh.shrivastava54@gmail.com
https://github.com/saurabhshri/CCAligner

[12-14 10:53:14][Debug] Initialising Aligner using PocketSphinx
[12-14 10:53:14][Debug] Audio Filename: test.wav Subtitle filename: test.srt
[12-14 10:53:14][Info] Reading and decoding audio samples...
[12-14 10:53:14][Debug] Begin reading WAV file
[12-14 10:53:14][Debug] Opening mode chosen: readFile, proceeding
[12-14 10:53:14][Debug] Trying to read from file : test.wav
[12-14 10:53:14][Debug] Reading file data
[12-14 10:53:15][Debug] File data read and stored in buffer
[12-14 10:53:15][Debug] Processing data and extracting samples
[12-14 10:53:15][Debug] Checking chunkID, should be RIFF
[12-14 10:53:15][Debug] Wave File chunkID verification successful
[12-14 10:53:15][Debug] Begin decoding wave file
[12-14 10:53:15][Debug] File format is identified as WAV
[12-14 10:53:15][Debug] Finding FMT and DATA subchunks
[12-14 10:53:15][Debug] FMT index : 12 , DATA index : 88
[12-14 10:53:15][Debug] PCM : True
[12-14 10:53:15][Debug] MONO : True
[12-14 10:53:15][Debug] Sample Rate 16KHz : True
[12-14 10:53:15][Debug] BitRate 16 bits/sec : True
[12-14 10:53:15][Debug] Number of samples : 64000
[12-14 10:53:15][Debug] Reading samples
[12-14 10:53:15][Debug] Successfully decoded
[12-14 10:53:15][Debug] File decoded successfully
[12-14 10:53:15][Debug] Generating Grammar based on subtitles, Grammar Name: 6
[12-14 10:53:16][Info] Generating language model and grammar files...
[12-14 10:53:16][Info] Note: You have chosen to generate a dictionary. Based on your TensorFlow configuration,
[12-14 10:53:16][Info] this may take some time, please be patient. For alternatives, see docs.
[12-14 10:53:16][Debug] Creating temporary directories at tempFiles/
[12-14 10:53:16][Debug] Directories created successfully!
[12-14 10:53:16][Info] Creating Corpus : tempFiles/corpus/corpus.txt
[12-14 10:53:16][Info] Creating Phonetic Corpus : tempFiles/corpus/phoneticCorpus.txt
[12-14 10:53:20][Debug] Creating vocabulary...
[12-14 10:53:20][Debug] Vocabulary created!
[12-14 10:53:20][Info] Creating the Dictionary, this might take a little time depending on your TensorFlow configuration : tempFiles/dict/complete.dict
Traceback (most recent call last):
File "/usr/local/bin/g2p-seq2seq", line 11, in
load_entry_point('g2p-seq2seq==5.0.0a0', 'console_scripts', 'g2p-seq2seq')()
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/app.py", line 82, in main
File "build/bdist.linux-x86_64/egg/g2p_seq2seq/g2p.py", line 72, in load_decode_model
RuntimeError: Model not found in g2p-seq2seq-cmudict/
[12-14 10:53:21][Fatal] /home/retrospect/Documents/code-in/CCAligner/src/lib_ccaligner/grammar_tools.cpp (191) : GenerateDict | Something went wrong while creating dictionary!
terminate called after throwing an instance of 'UnknownError'
what(): [12-14 10:53:21][Fatal] /home/retrospect/Documents/code-in/CCAligner/src/lib_ccaligner/grammar_tools.cpp (191) : GenerateDict | Something went wrong while creating dictionary!

Process finished with exit code 134 (interrupted by signal 6: SIGABRT)

Answer 4 · 2017-12-14T03:00:53.000Z

I have already installed all the dependencies, but there came another error.

Answer 5 · 2017-12-14T04:58:20.000Z

RuntimeError: Model not found in g2p-seq2seq-cmudict/

It says model not found.
Please check if you have done the following procedure:

Make sure the model folder and g2p-seq2seq-cmudict are in the directory where you are compiling CCAligner.

The model folder and g2p-seq2seq-cmudict are in install/ and you need to copy them manually to your program's working directory. You also need quick_lm.pl to be available.

Answer 6 · 2017-12-19T00:46:31.000Z

Thank you! It works.

CCAligner 0.03 Alpha [Shubham] Word by Word Audio-Subtitle Synchronization Saurabh Shrivastava | saurabh.shrivastava54@gmail.com https://github.com/saurabhshri/CCAligner

CCAligner 0.03 Alpha [Shubham]
Word by Word Audio-Subtitle Synchronization
Saurabh Shrivastava | saurabh.shrivastava54@gmail.com
https://github.com/saurabhshri/CCAligner