Faster Whisper FastAPI

🚀 Welcome to the Faster Whisper FastAPI project! This project is designed to provide a fast and efficient implementation of the Whisper algorithm using the FastAPI framework.

Getting Started

🔍 To get started with the Faster Whisper FastAPI project, you can follow these steps:

  1. Clone the project repository using the following command:
git clone
  1. Install the required dependencies using the following command:
pip install -r requirements.txt
  1. Start the FastAPI server using the following command:
uvicorn run:app --reload --port 8123
  1. Open the API documentation in your web browser using the following URL: http://localhost:8123/docs


  • Python 3.9 or higher


Before using Faster Whisper FastAPI, you need to install CUDA and cuDNN. Here are the installation instructions:

Install CUDA

You can download the CUDA installer from the NVIDIA website. Here are the steps:

Go to the following link to download the CUDA installer:

Run the CUDA installer and follow the instructions in the installation wizard.

Install cuDNN

You can download the cuDNN files from the NVIDIA website. Here are the steps:

Go to the following link to download the cuDNN files:

Extract the cuDNN files to a directory.


To use the transcribe function, you need to send a POST request to the FastAPI server with the following parameters:

  • model_size: a string specifying the size of the model to use. Valid values are "large-v2", "large-v1", "base", "tiny", "small", and "medium". You can find available models on the Hugging Face Hub.
  • device: a string specifying on which device to run the model. Valid values are "cpu" and "cuda". Defaults to "cuda".
  • compute_type: a string specifying which compute type to use. Valid values are "int8", "int8_float16", and "float16". Defaults to "float16".
  • to_lang: a string specifying the language to which the audio should be transcribed. Defaults to None.
  • file: an uploaded file object containing the audio data to transcribe.

Here is an example Python code to send a POST request:

import requests

url = "http://localhost:8123/transcribe"

files = {"file": ("audio.mp3", open("audio.wav", "rb"))}
data = {
    "model_size": "large-v2",
    "compute_type": "float16",

response =, data=data, files=files)

In the above example, we use the requests library to send a POST request to the FastAPI server. We specify the model size, device, and compute type to use, and upload the audio data to transcribe as a file. The server will return a JSON object containing the transcribed text result.

Run in colab

Open in Colab

Windows command

run in windows





For anyone having a problem, copy all DLLs from CUDNN, as well as cublasLt64_11.dll from the GPU Computing Toolkit into your ctranslate2 package directory. Since I'm using a venv, it was \faster-whisper\venv\Lib\site-packages\ctranslate2", but if you use Conda or just regular Python without virtual environments, it'll be different.


Check the CUDA version. Run the following command in the command line:

nvcc -V


For more information on Faster Whisper FastAPI, please visit the following GitHub repository:

I hope this information is helpful to you!