Transcribe audio without pronuncing the punctuation
I developed Wav2Vec4Humans because I didn't understand why we still had to talk like robots when speaking to our "smart" objects at the age of self-driving cars.
This project creates speech recognition models that also output punctuation so people can talk naturally.
It is based on finetuning a pre-trained Wav2Vec2 model using HuggingFace.
The following models have been developped:
- TODO
In order to test it…
TODO add instructions
To understand how the model was developed, check my W&B report. TODO add report.
To train your own speech model:
-
install requirements
pip install -r requirements.txt
-
make sure you're logged into W&B
wandb login
-
create a preprocessing function for your language
TODO add more details
-
run the training script
TODO insert full command with comments on parameters.
You can also use W&B sweeps to optimize hyper parameters:
-
define your sweep configuration file
update language in
sweep.yaml
-
create a sweep -> this will return a sweep id
wandb sweep sweep.yaml
-
launch an agent against the sweep
wandb agent my_sweep_id
Note: you can just use my Docker image: borisdayma/wav2vec4humans
To build your own Docker image:
$ docker build -t username/wav2vec4humans -f Dockerfile .
To push it to Docker Hub:
$ docker push username/wav2vec4humans
Set up ovhai:
$ ovhai login
$ ovhai config set BHS `#choose BHS or GRA based on your region`
To launch an instance:
$ ovhai job run \
--gpu 1 \
-v datasets@BHS:/workspace/datasets:rw:cache `#pre-processed datasets` \
-v cache@BHS:/workspace/.cache:rw:cache `#cache requires high capacity` \
-e WANDB_API_KEY=xxxxx `#insert your key for auto-login` \
borisdayma/wav2vec4humans `#you can choose your own docker image` \
Notes:
- once your dataset is created, you can load the volume in "ro" (read-only) instead of "rw" to avoid final sync
- you can automatically launch a command by adding
-- my_command
, for example-- wandb agent my_sweep_id
- remove local cache
rm -rf ~/.cache/**
before terminating your instance or it will take a very long time to sync back to its object storage (and each time you try to reload it) - to remove cache storage in BHS region:
ovhai data delete -ay bhs cache && ovhai data delete -y bhs cache
Built by Boris Dayma
For more details, visit the project repository.
If you have any questions about using W&B to track your model performance and predictions, please reach out to the slack community.
This project would not have been possible without the help of so many, in particular:
- W&B for the great tracking & visualization tools for ML experiments ;
- HuggingFace for providing a great framework for Natural Language Understanding ;
- wav2vec2-sprint from Suraj Patil for helping me create the docker file ;
- OVH cloud for the great cloud computing infrastructure ;
- the open source community who participated in the xlsr-wav2vec2 fine-tuning week and shared so many great tips!