how cant i train with my dataset?
dangvansam opened this issue · 4 comments
i have dataset: 1 folder 'wav' (.wav file), 1 text file have lines = num of wav file with format name_wav text_of_wav
so, how can i train with this data. thanks so much,, im beginer
p225_001 Please call Stella.
p225_002 Ask her to bring these things with her from the store.
p225_003 Six spoons of fresh snow peas, five thick slabs of blue cheese, and maybe a snack for her brother Bob
.....
Hi @dangvansam98, thanks for your question. It appears that my documentation is lacking behind since I switched from a TXT file to CSV. I'll try to update it over the weekend.
If I remember correctly, the model expects a data/train.csv
in its data directory.
With the following format:
path;label;length
timit/TIMIT/TEST/DR1/FAKS0/SI1573.WAV;his captain was thin and haggard and his beautiful boots were worn and shabby;4.9728125
timit/TIMIT/TEST/DR1/FAKS0/SI2203.WAV;the reasons for this dive seemed foolish now;3.513625
...
Where path
is the relative WAV path from the DATA_DIR/corpus/
directory. By default label
is the all lower case transcription without punctuation and length
is the audio length in seconds.
thank you for your answer 👍
Glad to hear. Btw. in case you are using a free speech corpus, could you link it?