A project to learn speech-to-voice APIs and automatic report making using:
- Python 3.10.1
- Whisper AI (OpenAI)
- Assembly API?
Python is bad at managing dependencies, especially when everything is run at a global level. We use virtual environments to get around this
Virtual environment created according to this guide
-
Setup a virtual environment:
python -m venv venv
-
Activate it:
source venv/bin/activate
- if successful you terminal should look like this:
(venv) $
-
Install packages using
python -m pip install -r requirements.txt
- This should automatically install all relevant packages
-
Run program
-
Deactivate virtual environment with
deactivate
If using pip, install the following packages:
whisper
openai
openai-whisper
ffmpeg
-
Initiate virtual environment according to previous section
-
Run
python -m pip freeze > requirements.txt
venv
isn't deterministic and we may encounter errors in the future. This is an alternative is used:
To start a poetry shell, use:
poetry shell
To deactivate and exit the shell, use:
exit
To only deactivate the virtual environsment without leaving the shell, use:
deactivate
To run a single script with poetry, use:
poetry run python you_script.py
Whisper AI has multiple models, having a trade-off between speed and quality. A good balance can be found using the Medium
model.
Follow this guide to install Whisper AI:
https://pypi.org/project/openai-whisper/
Proceed to start the virtual environment and add the whisper package. When using pip, write:
pip install -U openai-whisper
Make sure you have ffmpeg
installed on your computer, if not, download the latest version of ffmpeg
(use the first link) and follow the guides (second and third link) to add the `ffmpeg' binary to your PATH environment variable:
https://ffmpeg.org/download.html
https://www.youtube.com/watch?v=5xgegeBL0kw
https://www.geeksforgeeks.org/how-to-install-ffmpeg-on-windows/
Proceed to install ffmpeg
as a Python package using:
pip install ffmpeg
When using Windows, ensure that you have Chocolatey installed. If not, follow this guide:
https://chocolatey.org/install
There are 2 files that are too big for GitHub (above 100 MB), and we therefore need to use Git LFS. Start by following this guide:
https://docs.github.com/en/repositories/working-with-files/managing-large-files/installing-git-large-file-storage
When using a virtual environment, 2 files are too big for GitHub. Get around this by discarding the following changes before committing:
dnnl.lib
torch_cpu.dll
Follow this guide to use Git LFS:
https://www.youtube.com/watch?v=9HCsSD5PMSk
Use Git to open the repository and use:
git lfs track "FILE.NAME
git lfs push --all origin main
git add .
git commit -m "COMMIT MESSAGE"
git push -u origin master
To update the package to the latest version of this repository, please run:
pip install --upgrade --no-deps --force-reinstall git+https://github.com/openai/whisper.git
The repository contains a test file named test.m4a
. The actual text can be found on the follow websit under the title "Why tunnels?"
:
https://www.boringcompany.com/
-
Python related:
-
Git (gud):
-
General QoL:
-
Whisper AI (OpenAI)
-
Styling
- Uses ESLint and Prettier