/phonix

Generate captions for videos using the power of OpenAI's Whisper API

Primary LanguagePythonMIT LicenseMIT

Phonix

Generate captions for videos using the power of OpenAI's Whisper API

What?

Phonix is a Python program that uses OpenAI's API to generate captions for videos.

It uses the Whisper model, an automatic speech recognition system that can turn audio into text and potentially translate it too. Compared to other solutions, it has the advantage that its transcription can be "enhanced" by the user providing prompts that indicate the "domain" of the video. This means you may get better results if you use technical terms, acronyms and jargon.

Captivating captions

Now phonix supports "captivating" captions, which means that you can produce captions that highlight the currently spoken words in the video and choose the maximum number of words present in each caption. This means you will be able to produce "influencer-style" captions with few words per caption and highlighting the current word. 💫
This is enabled through stable-ts so you will need to install it (see below).

Overall the following options are available when it comes to styling the captions:

  • Highlight the current word
  • Choose the maximum number of words per caption
  • Choose the caption font size
  • Choose the caption font color
  • Choose the caption font family

Why?

Captions are not just for the hearing impaired. They make your content more engaging by boosting your audience's focus, attention and comprehension while allowing them to watch your video without sound.

I was not particularly satisfied with the accuracy of Youtube's and Linkedin's automatic captions so I gave Whisper a try and was impressed by the results. Phonix makes it easy to use Whisper and generate captions for your videos.

How?

Phonix first extracts the audio from the video, then downsamples it in case it's over 25 MB and finally sends it to OpenAI's Whisper API. The API returns the captions in the specified format and Phonix saves them to a file. You can then use the captions in your video editor of choice.

Phonix was originally a command line application but I thought it'd be cool to create a simple GUI for it. Use whichever you feel more comfortable with.

Installation

  • Get an OpenAI API key
    • This is a paid service and a 25 minute South Park episode cost me around $0.30 to transcribe
  • Clone or download this repository
  • Install a recent version of Python with Tkinter
  • Install ffmpeg for your platform
  • Install Python dependencies: pip install -r requirements-basic.txt
    • If you want to transcribe locally without the need to pay for an OpenAI API key, then pip install -r requirements-advanced.txt and choose to run Whisper locally.

Command line usage

phonix.py is the command line interface that also includes the main logic of the program.
It has a few options that you can see by running python phonix.py --help.

GUI usage

Assuming you have installed the dependencies, you can run the GUI with python phonix_gui.py. A demo of the tool can be found in this video.