A simple command-line tool to generate transcripts for podcast episodes.
- Download and process podcast episodes from a given MP3 URL.
- Automatically resamples audio to 16kHz mono because Groq will do this anyway.
- Splits large audio files into manageable chunks.
- Transcribes audio using the Groq API.
- Outputs transcripts in multiple formats:
- DOTe JSON
- Podlove JSON
- WebVTT (subtitle format)
- Python 3.10 or higher
- ffmpeg installed and available in your system’s PATH.
- A Groq API key for transcription services.
1Install the package:
pip install podcast-transcript # or pipx/uvx install podcast-transcript
The application requires a Groq API key to function. You can set the API key in one of the following ways:
- Environment Variable:
Set the GROQ_API_KEY environment variable in your shell:
export GROQ_API_KEY=your_api_key_here
# or
GROQ_API_KEY=your_api_key_here podcast-transcript ...
- .env File:
Create a .env file in the transcript directory (default is ~/.podcast-transcripts/) and add the following line:
GROQ_API_KEY=your_api_key_here
By default, transcripts are stored in ~/.podcast-transcripts/. You can change this by setting the TRANSCRIPT_DIR environment variable:
export TRANSCRIPT_DIR=/path/to/your/transcripts
You can also set the following environment variables or specify them in the .env file:
- TRANSCRIPT_MODEL_NAME: The name of the model to use for the transcript (default is "whisper-large-v3").
- TRANSCRIPT_PROMPT: The prompt to use for the transcription (default is "podcast-transcript").
- TRANSCRIPT_LANGUAGE: The language code for the transcription (default is en, you could set it to de for example).
To transcribe a podcast episode, run the transcribe command followed by the URL of the MP3 file:
transcribe <mp3_url>
Example:
transcribe https://d2mmy4gxasde9x.cloudfront.net/cast_audio/pp_53.mp3
The transcription process involves the following steps:
- Download the MP3 file from the provided URL.
- Reample the audio to 16kHz mono for optimal transcription.
- Split the audio into chunks if it exceeds the size limit (25 MB).
- Transcribe each audio chunk using the Groq API.
- Combine the transcribed chunks into a single transcript.
- Generate output files in DOTe JSON, Podlove JSON, and WebVTT formats.
The output files are saved in a directory named after the episode, within the transcript directory.
- DOTe JSON (*.dote.json): A JSON format suitable for further processing or integration with other tools.
- Podlove JSON (*.podlove.json): A JSON format compatible with Podlove transcripts.
- WebVTT (*.vtt): A subtitle format that can be used for captioning in media players.
- Support for multitrack transcripts with speaker identification.
- Add support for other transcription backends (e.g., openAI, speechmatics, local whisper).
- Add support for other audio formats (e.g., AAC, WAV, FLAC).
- Add more output formats (e.g., SRT, TTML).
- Clone the repository:
git clone https://github.com/yourusername/podcast-transcript.git
cd podcast-transcript
- Create a virtual environment:
uv venv
- Install the package in editable mode:
uv sync
The project uses pytest for testing. To run tests:
pytest
Install pre-commit hooks to ensure code consistency:
pre-commit install
Check the type hints:
mypy src/
Build the distribution package:
uv build
Publish the package to PyPI:
uv publish --token your_pypi_token
This project is licensed under the MIT License.