/ozen-toolkit

Audio datasets, easier.

Primary LanguagePython

OZEN toolkit, AI powered audio dataset helper.

Buy Me a Coffee at ko-fi.com

OZEN is a small tool to help you process audio files to a LJ format.

Given a folder of files or a single audio file, it will extract the speech, transcribe using Whisper and save in the LJ format (wavs in wavs folder, train and valid txts).

INSTALLATION

Accept the license terms on https://huggingface.co/pyannote/segmentation 
Install Anaconda or setup your own environment and install requirements
git clone https://github.com/devilismyfriend/ozen-toolkit
run Set Up Ozen.bat

USAGE

Drag a folder or a file on the Drag_Here.bat to process it.

The first time you'll be prompted to provide an HuggingFace token, once you do a config file will be created where you can specifiy models to use, the validation/training data desired split and more.

Alternatively you can use ozen.py in cli.