Automation to improve the dataset export process for Common Voice
$ voice-corpora-automation
$ python setup.py build
$ python setup.py install
Environment variables
- CV_DATABASE_URL
- Common voice read-only replica
- CV_EXPORT_DIR
- Path to store the clips tsv
- CV_S3_BUCKET
- CV clips S3 bucket
- CORPORA_EXPORT_DIR
- Path to store the corpora tsv
- CORPORA_DATABASE_URL
- Path to the corpora database
- CORPORA_DATABASE_TABLE
- Corpora database table
- CORPORA_S3_BUCKET
- S3 bucket to store the public clips
The MIT License (MIT)
voice-corpora-automation was written by John Giannelos.