FiresideSubtitles

A simple Python project that provides transcription, speaker diarization, and face detection in a simple package.

More to come!

Preparing Environment

Prepare the .env file in the root of the cloned project with the following keys:
- HUGGING_FACE_TOKEN: Specifies the Hugging Face token to use
- SHOULD_TRAIN_FACES_BEFORE_EXECUTION: Specifies whether face recognition data should be retrained before execution
- SHOULD_SHOW_PREVIEWS: Specifies whether a preview of the current frame will be displayed during processing
- FILENAME_TO_PROCESS: Specifies the filename to process, without its extension
```
HUGGING_FACE_TOKEN=xxxxx

SHOULD_TRAIN_FACES_BEFORE_EXECUTION=1
SHOULD_SHOW_PREVIEWS=1

FILENAME_TO_PROCESS="Rick Astley - Never Gonna Give You Up"
```
Accept the conditions of use for the pyannote/speaker-diarization-3.1 pipeline on Hugging Face.
Install Homebrew, and run brew install ffmpeg cmake in your Terminal.
Run pip install -r requirements.txt in your Terminal, from the project root.

Create a media folder in the project root.
In the media folder, create an input folder, and put your video files in that folder.

If you are using face recognition (label_faces), you will need to prepare a set of embeddings to use.

Create a models folder in the project root.
In the models folder, create a faces folder.
In the faces folder, insert photos of people you want to identify, separated by folders based on the person.
Run training.py to generate the embeddings required for face recognition.