/wav2train

automatically align transcribed audio and generate a wav2letter training corpus

Primary LanguagePythonMIT LicenseMIT

wav2train

Automatic pipeline to prepare a directory full of (audio clip : transcript) file pairs for wav2letter training. Currently uses DSAlign for transcript alignment.

This project is part of Talon Research. If you find this useful, please donate.

Installation

This process works best on a Mac or Linux computer.

Debian

sudo apt install build-essential libboost-all-dev cmake zlib1g-dev libbz2-dev liblzma-dev \
                 python3 python3-pip ffmpeg wget
./setup

macOS

brew install python3 ffmpeg wget cmake boost
./setup

Usage

./wav2train input/ output/
# ./wfilter output/clips.lst > output/clips-filt.lst # not yet implemented
./wsplit  output/clips.lst

Description

  1. Consumes a directory with audio and matching transcripts, such as:

    input/a.wav input/a.txt
    input/b.wav input/b.txt
    

    Most common audio formats (wav, flac, mp3, ogg, sph, etc) will be detected. You can mix formats in the input directory. The audio files can be any length. The only requirement is that the text file is a transcription of the audio file.

  2. Finds voice activity in the audio files and time-aligns these segments to the transcription.

  3. Extracts the voice segments into .flac files and creates a wav2letter-compatible clips.lst file.

  4. The output at this point looks like:

    output/clips/a.flac
    output/clips/b.flac
    output/clips.lst
    
  5. [Optional] not included yet Use the wfilter tool to filter out "bad inputs" using a pretrained model and an error threshold.

    ./wfilter output/clips.lst > output/clips-filt.lst
    
  6. [Optional] Use the wsplit tool to auto-split a clips.lst file into dev.lst,test.lst,train.lst.

    ./wsplit output/clips.lst
    # or, if you filtered:
    ./wsplit output/clips-filt.lst
    

Extras

./wplay output/clips.lst # print the transcript and play each clip, for debugging