Speech-to-Text

A project to learn speech-to-voice APIs and automatic report making using:

Python 3.10.1
Whisper AI (OpenAI)
Assembly API?

Virtual environment

Y though?

Python is bad at managing dependencies, especially when everything is run at a global level. We use virtual environments to get around this

Virtual environment setup

Virtual environment created according to this guide

Setup a virtual environment:
- python -m venv venv
Activate it:
- source venv/bin/activate
- if successful you terminal should look like this: (venv) $
Install packages using python -m pip install -r requirements.txt
- This should automatically install all relevant packages
Run program
Deactivate virtual environment with deactivate

Packages

If using pip, install the following packages:

whisper
openai
openai-whisper
ffmpeg

How to generate `requirements.txt`

Initiate virtual environment according to previous section
Run python -m pip freeze > requirements.txt

Whisper AI

Usage

Whisper AI has multiple models, having a trade-off between speed and quality. A good balance can be found using the Medium model.

Installing Whisper AI

Follow this guide to install Whisper AI:

https://pypi.org/project/openai-whisper/

Proceed to start the virtual environment and add the whisper package. When using pip, write:

pip install -U openai-whisper

Make sure you have ffmpeg installed on your computer, if not, download the latest version of ffmpeg (use the first link) and follow the guides (second and third link) to add the `ffmpeg' binary to your PATH environment variable:

https://ffmpeg.org/download.html
https://www.youtube.com/watch?v=5xgegeBL0kw
https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/

Proceed to install ffmpeg as a Python package using:

pip install ffmpeg

When using Windows, ensure that you have Chocolatey installed. If not, follow this guide:

https://chocolatey.org/install

Large Files

There are 2 files that are too big for GitHub (above 100 MB), and we therefore need to use Git LFS. Start by following this guide:

https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage

When using a virtual environment, 2 files are too big for GitHub. Get around this by discarding the following changes before committing:

dnnl.lib
torch_cpu.dll

Follow this guide to use Git LFS:

https://www.youtube.com/watch?v=9HCsSD5PMSk

Use Git to open the repository and use:

git lfs track "FILE.NAME
git lfs push --all origin main
git add .
git commit -m "COMMIT MESSAGE"
git push -u origin master

Updating Whisper AI

To update the package to the latest version of this repository, please run:

pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git

Testing

Test files

The repository contains a test file named test.m4a. The actual text can be found on the follow websit under the title "Why tunnels?":

https://www.boringcompany.com/

Useful development plugins

Python related:
- Python Environment Manager
- Python Indent
Git (gud):
General QoL:
Whisper AI (OpenAI)
- Whisper AI
Styling
- Uses ESLint and Prettier

MunchProductionz/Speech-to-Text

Speech-to-Text

Virtual environment

Y though?

Virtual environment setup

Packages

How to generate `requirements.txt`

Further reading in package management

Whisper AI

Usage

Installing Whisper AI

Large Files

Updating Whisper AI

Testing

Test files

Useful development plugins

MunchProductionz/Speech-to-Text

Speech-to-Text

Virtual environment

Y though?

Virtual environment setup

Packages

How to generate requirements.txt

Further reading in package management

Whisper AI

Usage

Installing Whisper AI

Large Files

Updating Whisper AI

Testing

Test files

Useful development plugins

How to generate `requirements.txt`