This package allows you to finetune titanet_large on librispeech.
-
train.py
allows you to fine-tune the model. This script should be run first before executing the other scripts -
scripts/eval_cosine_sim.py
is for evaluating the cosine similarity of some segments from thedata
directory. To run the script, usepython scripts/eval_cosine_sim.py
- data is the segmented recordings split into train and test (The smaller dataset). The original recordings are also included in the
recordings
subdirectory alongside their transcripts (in csv format), and annotations in rttm formats (generated from transcripts). - scripts folder contains the scripts for this package Some folders will be created when the package is being run, like 'nemo_experiments' and 'segments'
- scripts/old contain the old scripts for the first training on the small dataset (not librispeech).
- This package runs with the latest version of Nemo and requires python 3.10.12. Nemo does not support 3.11 at this time.
- install correct version of torch. For instance, run
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
- on your terminal:
apt-get update && apt-get install -y libsndfile1 ffmpeg
- you also need to have soX on your system
sudo apt-get install sox
- Next run
pip install -r requirements.txt
- Next, install run
pip install -e .
to install the package in your environment. - run
python scripts/train.py
- When the file has ran, you can call the fine-tuned model from the directory using the
restore_from
method of the 'EncDecSpeakerLabelModel' class. The loaded model can be used for inference (the model will be saved in a 'nemo_experiments' folder)