This script automates the process of transcribing podcast episodes, generating summarized transcripts, and saving the results. It utilizes the OpenAI API and Python's glob
module to achieve this.
- Python 3.6 or later
- An OpenAI API key
- Install required Python packages using the following command:
pip install openai
Clone this repository: Set up your OpenAI API key as an environment variable:
export OPENAI_API_KEY=your-api-key
Organize your podcast audio files in the ./podcasts/ directory. Supported audio format: .m4a. It's recommend to have the podcasts with AAC HE-V2 Audio to stay below whisper's file size limit of 25mb.
The script will iterate through the podcast files in the ./podcasts/ directory, transcribe them, and generate a summarized transcript using the OpenAI GPT-3.5 Turbo model. Summarized transcripts will be saved in the ./results/ directory with the corresponding podcast number as the filename.
You can adjust the temperature parameter in the generate_corrected_transcript function to control the creativity of the generated summary. Modify the system_prompt to tailor the instructions for summarizing according to your specific needs.