🚀 Welcome to the Faster Whisper FastAPI project! This project is designed to provide a fast and efficient implementation of the Whisper algorithm using the FastAPI framework.
🔍 To get started with the Faster Whisper FastAPI project, you can follow these steps:
- Clone the project repository using the following command:
git clone https://github.com/frankwongWO/faster-whisper-fastapi.git
- Install the required dependencies using the following command:
pip install -r requirements.txt
- Start the FastAPI server using the following command:
uvicorn run:app --reload --port 8123
- Open the API documentation in your web browser using the following URL: http://localhost:8123/docs
- Python 3.9 or higher
Before using Faster Whisper FastAPI, you need to install CUDA and cuDNN. Here are the installation instructions:
You can download the CUDA installer from the NVIDIA website. Here are the steps:
Go to the following link to download the CUDA installer:
Run the CUDA installer and follow the instructions in the installation wizard.
You can download the cuDNN files from the NVIDIA website. Here are the steps:
Go to the following link to download the cuDNN files:
https://developer.nvidia.com/cudnn
Extract the cuDNN files to a directory.
To use the transcribe function, you need to send a POST request to the FastAPI server with the following parameters:
- model_size: a string specifying the size of the model to use. Valid values are "large-v2", "large-v1", "base", "tiny", "small", and "medium". You can find available models on the Hugging Face Hub.
- device: a string specifying on which device to run the model. Valid values are "cpu" and "cuda". Defaults to "cuda".
- compute_type: a string specifying which compute type to use. Valid values are "int8", "int8_float16", and "float16". Defaults to "float16".
- to_lang: a string specifying the language to which the audio should be transcribed. Defaults to None.
- file: an uploaded file object containing the audio data to transcribe.
import requests
url = "http://localhost:8123/transcribe"
files = {"file": ("audio.mp3", open("audio.wav", "rb"))}
data = {
"model_size": "large-v2",
"compute_type": "float16",
}
response = requests.post(url, data=data, files=files)
print(response.json())
In the above example, we use the requests library to send a POST request to the FastAPI server. We specify the model size, device, and compute type to use, and upload the audio data to transcribe as a file. The server will return a JSON object containing the transcribed text result.
./run.ps1
deactivate
For anyone having a problem, copy all DLLs from CUDNN, as well as cublasLt64_11.dll from the GPU Computing Toolkit into your ctranslate2 package directory. Since I'm using a venv, it was \faster-whisper\venv\Lib\site-packages\ctranslate2", but if you use Conda or just regular Python without virtual environments, it'll be different.
nvcc -V
For more information on Faster Whisper FastAPI, please visit the following GitHub repository:
I hope this information is helpful to you!