/voice_to_speech

Transcribe an audio file into text, then summarize

Primary LanguagePython

voice to speech

Simply invoke whisper and GPT to transcribe an audio file into text, then summarize into text

Installation

pip install git+https://github.com/yztxwd/voice_to_speech.git

Usage

You need an upgraded OpenAI API account for using this repo

Go to settings for your Organization ID, and generate an API key at API keys (remember to COPY and SAVE IT!)

Tiny whisper model on example audio file

voice_to_speech -m tiny -a $OPENAI_API_KEY -o $ORGANIZATION_ID data/audio.mp3

with MPS accelerator (M-chip):

voice_to_speech -m tiny -a $OPENAI_API_KEY -o $ORGANIZATION_ID --device mps data/audio.mp3 

with CUDA (Nvidia):

voice_to_speech -m tiny -a $OPENAI_API_KEY -o $ORGANIZATION_ID --device cuda:0 data/audio.mp3 

if you only have video file, use ffmpeg to extract audio, for example:

# re-encoding depends on the audio format in video
ffmpeg -i video.mp4 -map 0:a -acodec libmp3lame audio.mp3