HW1 ― End-to-end Speech Recognition

In this assignment, you are given a dataset of single speaker audio files. Your task is to perform speech recognition on the dataset.

For more details, please click this link to view the slides of HW1.

Github Setup

Please create your own private Github repo to host your files. The repo name should be DLHLP2020-SPRING. And you repo link will be https://github.com/<YOUR_TEAM_GITHUB_ID>/DLHLP2020-SPRING. After that, your MUST register <YOUR_TEAM_GITHUB_ID> with this form, and add TA's github account (DLHLP2020-TA) to the repo collaborator, or we cannot receive your submission.

In HW1, please create a folder hw1 to place all your file used in HW1.

Dataset

In this repository, we have provided a shell script for downloading and extracting the dataset for this assignment. For Linux users, simply use the following command.

bash ./get_dataset.sh

The shell script will automatically download the dataset and store the data in a folder called data/DLHLP/ (only works on Linux.) If you are using other operating systems, you should download the dataset from this link.

You should keep a copy of the dataset only in your local machine. DO NOT upload the dataset to this remote repository.

Kaggle Submission Format

Kaggle link: https://www.kaggle.com/c/dlhlp2020spring-asr/overview

All students should submit your answer in CSV format with , as delimiter. Submission files should contain two columns: id and answer. id should be a six-digits number (e.g. 009001 according to 009001.wav audio file. The answer to the wav file should be put in answer column. There should NOT be any space between the answer and delimiter. Sorting according to id is not necessary.

id,answer
009001,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ
009002,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ
...,...
009999,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ
010000,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ

If you use the sample code (End-to-end-ASR-Pytorch), you can produce the result by:

python3 main.py --config <config file> --test --njobs 8

After that, use this simple script to format:

python3 format.py <result csv file> <output file name>

Evaluation on dev set

To evaluate your model on dev set, you can run the provided evaluation script provided in the starter code by using the following command.

Mean Levenshtein Distance

python3 eval_lev.py ans.dev.csv <predict_csv_file>

Word Error Rate (and Char Error Rate)

python3 eval_wer.py ans.dev.csv <predict_csv_file>

Submission Rules

Kaggle Deadline

2020/03/22 (Sun.) 23:59

All files submission Deadline

2020/03/25(Wed) before class

File Submission Format

Your should includes the following files in the hw1/ directory of this repository:

report.pdf
The report of your homework assignment. Refer to the "Report Questions" section in the slides for what you should include in the report.
reproduce.sh
The shell script file for running your ASR model. The produced answer should be the same as the best submitted file on kaggle public leaderboard. If the score is not matched, you would lost the 5 points on kaggle.
other code or models you used.

We will run your code in the following manner:

bash ./reproduce.sh $1 $2

where $1 is the audio dataset directory (e.g. data/DLHLP), and $2 is name of the output prediction csv file (e.g. ans.csv ).

If your model is larger than GitHub’s maximum capacity (100MB), you can upload your model to Dropbox (see this tutorial). Your shell script files should be able to download the model automatically.

Packages

In requirements.txt is a list of packages you are allowed to import in this assignment.

Note that using packages with different versions will very likely lead to compatibility issues when we reproduce your results. If you use your own computer, make sure that you install the correct version as listed in requirements.txt. E-mail or ask the TAs first if you want to import other packages.

Q&A

Comment your question in this post
Contact TAs by e-mail (dlhlp.ta@gmail.com)

mengyanggithub/hw1-speech-recognition