An API to transcribe audio with OpenAI's Whisper Large v3! Powered by π€ Transformers, Optimum & flash-attn
Features:
- π€ Transcribe audio to text at blazing fast speeds
- π Fully open source and deployable on any GPU cloud provider
- π£οΈ Built-in speaker diarization
- β‘ Easy to use and Fast API layer
- π Async background tasks and webhooks
- π₯ Optimized for concurrency and parallel processing
- β Task management, cancel and status endpoints
- π Admin authentication for secure API access
- 𧩠Fully managed API available on JigsawStack
Based on Insanely Fast Whisper CLI project. Check it out if you like to set up this project locally or understand the background of insanely-fast-whisper.
This project is focused on providing a deployable blazing fast whisper API with docker on cloud infrastructure with GPUs for scalable production use cases.
With Fly.io recent GPU service launch, I've set up the fly config file to easily deploy on fly machines! However, you can deploy this on any other VM environment that supports GPUs and docker.
Here are some benchmarks we ran on Nvidia A100 - 80GB and fly.io GPU infraπ
Optimization type | Time to Transcribe (150 mins of Audio) |
---|---|
large-v3 (Transformers) (fp16 + batching [24] + Flash Attention 2 ) |
~2 (1 min 38 sec) |
large-v3 (Transformers) (fp16 + batching [24] + Flash Attention 2 + diarization ) |
~2 (3 min 16 sec) |
large-v3 (Transformers) (fp16 + batching [24] + Flash Attention 2 + fly machine startup ) |
~2 (1 min 58 sec) |
large-v3 (Transformers) (fp16 + batching [24] + Flash Attention 2 + diarization + fly machine startup ) |
~2 (3 min 36 sec) |
The estimated startup time for the Fly machine with GPU and loading up the model is around ~20 seconds. The rest of the time is spent on the actual computation.
yoeven/insanely-fast-whisper-api:latest
Docker hub: yoeven/insanely-fast-whisper-api
- Make sure you already have access to Fly GPUs.
- Clone the project locally and open a terminal in the root
- Rename the
app
name in thefly.toml
if you like - Remove
image = 'yoeven/insanely-fast-whisper-api:latest'
infly.toml
only if you want to rebuild the image from theDockerfile
Install fly cli if don't already have it
Only need to run this the first time you launch a new fly app
fly launch
-
Fly will prompt:
Would you like to copy its configuration to the new app? (y/N)
. Yes (y
) to copy configuration from the repo. -
Fly will prompt:
Do you want to tweak these settings before proceeding
if you have nothing to adjust. Most of the required settings are already configured in thefly.toml
file. Non
to proceed and deploy.
The first time you deploy it will take some time since the image is huge. Subsequent deploys will be a lot faster.
Run the following if you want to set up speaker diarization or an auth token to secure your API:
fly secrets set ADMIN_KEY=<your_token> HF_TOKEN=<your_hf_key>
Run fly secrets list
to check if the secrets exist.
To get the Hugging face token for speaker diarization you need to do the following:
- Accept
pyannote/segmentation-3.0
user conditions - Accept
pyannote/speaker-diarization-3.1
user conditions - Create an access token at
hf.co/settings/tokens
.
Your API should look something like this:
https://insanely-fast-whisper-api.fly.dev
Run fly logs -a insanely-fast-whisper-api
to view logs in real time of your fly machine.
Since this is a dockerized app, you can deploy it to any cloud provider that supports docker and GPUs with a few config tweaks.
JigsawStack provides a bunch of powerful APIs for various use cases while keeping costs low. This project is available as a fully managed API here with enhanced cloud scalability for cost efficiency and high uptime. Sign up here for free!
If you had set up the ADMIN_KEY
environment secret. You'll need to pass x-admin-api-key
in the header with the value of the key you previously set.
If deployed on Fly, the base URL should look something like this:
https://{app_name}.fly.dev/{path}
Depending on the cloud provider you deploy to, the base URL will be different.
Transcribe or translate audio into text
Name | value |
---|---|
url (Required) | url of audio |
task | transcribe , translate default: transcribe |
language | None , en , other languages default: None Auto detects language |
batch_size | Number of parallel batches you want to compute. Reduce if you face OOMs. default: 64 |
timestamp | chunk , work default: chunk |
diarise_audio | Diarise the audio clips by speaker. You will need to set hf_token. default:false |
webhook | Webhook POST call on completion or error. default: None |
webhook.url | URL to send the webhook |
webhook.header | Headers to send with the webhook |
is_async | Run task in background and sends results to webhook URL. true , false default: false |
managed_task_id | Custom Task ID used to reference ongoing task. default: uuid() v4 will be generated for each transcription task |
Get all active transcription tasks, both async background tasks and ongoing tasks
Get the status of a task, completed tasks will be removed from the list which may throw an error
Cancel async background task. Only transcription jobs created with is_async
set to true
can be cancelled.
# clone the repo
$ git clone https://github.com/jigsawstack/insanely-fast-whisper-api.git
# change the working directory
$ cd insanely-fast-whisper-api
# install torch
$ pip3 install torch torchvision torchaudio
# upgrade wheel and install required packages for FlashAttention
$ pip3 install -U wheel && pip install ninja packaging
# install FlashAttention
$ pip3 install flash-attn --no-build-isolation
# generate updated requirements.txt if you want to use other management tools (Optional)
$ poetry export --output requirements.txt
# get the path of python
$ which python3
# setup virtual environment
$ poetry env use /full/path/to/python
# install the requirements
$ poetry install
# run the app
$ uvicorn app.app:app --reload
Fly machines are charged by the second and might take up to 15mins of idling before it decides to shut it self down. You can shut down the machine when you're done with the API to save costs. You can do this by sending a POST
request to the following endpoint:
https://api.machines.dev/v1/apps/<app_name>/machines/<machine_id>/stop
Authorization header:
Authorization Bearer <fly_token>
Lear more here
- Vaibhav Srivastav for writing a huge chunk of the code and the CLI version of this project.
- OpenAI Whisper
This project is part of JigsawStack - A suite of powerful and developer friendly APIs for various use cases while keeping costs low. Sign up here for free!