Table of Contents
This package assists in generating training data for fine-tuning Whisper by synthesizing .srt files from sentences, mimicking real data through sentence concatenation.
-
Data File (.tsv):
- Create a
.tsv
file with two required columns:path
: The relative path to the.mp3
file.sentence
: The text corresponding to the audio file.
- Optional: If a
client_id
is included, it can be used to increase the probability that following sentences are from the same speaker. Refer togenerate_fold
insrc/whisper_prep/generation/generate.py
for additional features.
- Create a
-
Configuration File (.yaml):
- Set up a
.yaml
configuration file. An example can be found atexample.yaml
.
- Set up a
-
Running the Generation Script:
- Run
whisper_prep -c <path_to_your_yaml_file>
.
- Run
-
Upload to Huggingface.com:
Vincenzo Timmel - vincenzo.timmel@fhnw.ch
Distributed under the MIT License. See LICENSE
for more information.