Audiocorpusbuilder-package was made to automatically create a russian language audio corpus from YouTube videotracks playlists: it downloads video's audio and subtitles, makes pairs "sound-text" and saves them in the directory. If there are not subtitles for the video, audiocorpusbuilder misses it.
For installation you need Python 3.6 or later and OC Linux on your local machine.
You can install it with these commands:
git clone https://github.com/dangrebenkin/audiocorpusbuilder.git
cd audiocorpusbuilder
python3 setup.py install
To run audiocorpusbuilder you should prepare directories for audiotracks, subtitles, results (directories should be like '/home/Audio/'). Also you need to create playlists.txt with playlists' links, every link should be on the separate line.
All arguments are required for program use.
- -p URL_list
Playlists txt-file path.
- -a directory_audio
Path to download audiotracks.
- -s directory_subtitles
Path to download subtitles.
- -r directory_results
Path to results.
acbr [-p URL_list] [-a directory_audio] [-s directory_subtitles] [-r directory_results]
acbr -p playlists.txt -a Audio -s Subs -r Results