/oncetold-podcast-transcriber

Examples of programs built using Modal with OpenAI Whisper and Podchaser API to transcribe podcast episodes

Primary LanguagePython

Oncetold Podcast Transcriber

This is a clone of the Modal Podcast Transcriber

This is a complete application that uses OpenAI Whisper to transcribe podcasts. Modal spins up 100-300 containers for a single transcription run, so hours of audio can be transcribed on-demand in a few minutes.

Architecture

The entire application is hosted serverlessly on Modal and consists of 3 components:

  1. React + Vite SPA (pod_transcriber/frontend/)
  2. FastAPI server (pod_transcriber/api.py)
  3. Modal async job queue (pod_transcriber/main.py)

Developing locally

Requirements

  • account approved Modal.com account
  • npm
  • modal installed in your current Python virtual environment

Podchaser Secret

To run this on your own Modal account, you'll need to create a Podchaser account and create an API key.

Once you have the Podchaser account and API keys, then, create a Modal Secret with the following keys:

  • PODCHASER_CLIENT_SECRET
  • PODCHASER_CLIENT_ID

It doesn't matter what you call the Modal Secret block -- what matters is that both KEYS (with VALUES) are listed in the block (Note: This will not work locally).

You can find both on their API page.

Vite build

cd into the pod_transcriber/frontend directory, and run:

  • npm install
  • npx vite build --watch

The last command will start a watcher process that will rebuild your static frontend files whenever you make changes to the frontend code.

Serve on Modal

Once you have vite build running, in a separate shell run this at the app root to start an ephemeral app on Modal:

modal serve oncetold-podcast-transcriber.main

Pressing Ctrl+C will stop your app.

Deploy to Modal

Once your happy with your changes, cd to the root of the project and run modal deploy oncetold-podcast-transcriber.main to deploy your app to Modal.

Testing

Modal.com deployment allows you to transcribe HTML only at about $.15/60-minutes of audio. However, you can cut and paste what looks like an SRT version of the transcript to your own *.SRT file.