how cant i train with my dataset?

Question

how cant i train with my dataset?

dangvansam opened this issue 6 years ago · 4 comments

i have dataset: 1 folder 'wav' (.wav file), 1 text file have lines = num of wav file with format name_wav text_of_wav
so, how can i train with this data. thanks so much,, im beginer

Answer 1 · 2019-04-12T04:58:02.000Z

p225_001 Please call Stella.
p225_002 Ask her to bring these things with her from the store.
p225_003 Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob
.....

Answer 2 · 2019-04-12T09:36:25.000Z

Hi @dangvansam98, thanks for your question. It appears that my documentation is lacking behind since I switched from a TXT file to CSV. I'll try to update it over the weekend.

If I remember correctly, the model expects a data/train.csv in its data directory.
With the following format:

path;label;length
timit/TIMIT/TEST/DR1/FAKS0/SI1573.WAV;his captain was thin and haggard and his beautiful boots were worn and shabby;4.9728125
timit/TIMIT/TEST/DR1/FAKS0/SI2203.WAV;the reasons for this dive seemed foolish now;3.513625
...

Where path is the relative WAV path from the DATA_DIR/corpus/ directory. By default label is the all lower case transcription without punctuation and length is the audio length in seconds.

Answer 3 · 2019-04-16T03:39:57.000Z

thank you for your answer 👍

Answer 4 · 2019-04-17T12:38:31.000Z

Glad to hear. Btw. in case you are using a free speech corpus, could you link it?