In this assignment, you are given a dataset of single speaker audio files. Your task is to perform speech recognition on the dataset.
For more details, please click this link to view the slides of HW1.
Please create your own private Github repo to host your files. The repo name should be DLHLP2020-SPRING
. And you repo link will be https://github.com/<YOUR_TEAM_GITHUB_ID>/DLHLP2020-SPRING
. After that, your MUST register <YOUR_TEAM_GITHUB_ID>
with this form, and add TA's github account (DLHLP2020-TA) to the repo collaborator, or we cannot receive your submission.
In HW1, please create a folder hw1
to place all your file used in HW1.
In this repository, we have provided a shell script for downloading and extracting the dataset for this assignment. For Linux users, simply use the following command.
bash ./get_dataset.sh
The shell script will automatically download the dataset and store the data in a folder called data/DLHLP/
(only works on Linux.) If you are using other operating systems, you should download the dataset from this link.
You should keep a copy of the dataset only in your local machine. DO NOT upload the dataset to this remote repository.
Kaggle link: https://www.kaggle.com/c/dlhlp2020spring-asr/overview
All students should submit your answer in CSV format with ,
as delimiter. Submission files should contain two columns: id
and answer
. id
should be a six-digits number (e.g. 009001
according to 009001.wav
audio file. The answer to the wav file should be put in answer
column. There should NOT be any space between the answer
and delimiter. Sorting according to id
is not necessary.
id,answer
009001,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ
009002,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ
...,...
009999,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ
010000,ㄩˇ ㄧㄣ ㄅㄧㄢˋ ㄕˋ
If you use the sample code (End-to-end-ASR-Pytorch), you can produce the result by:
python3 main.py --config <config file> --test --njobs 8
After that, use this simple script to format:
python3 format.py <result csv file> <output file name>
To evaluate your model on dev set, you can run the provided evaluation script provided in the starter code by using the following command.
python3 eval_lev.py ans.dev.csv <predict_csv_file>
python3 eval_wer.py ans.dev.csv <predict_csv_file>
2020/03/22 (Sun.) 23:59
2020/03/25(Wed) before class
Your should includes the following files in the hw1/
directory of this repository:
report.pdf
The report of your homework assignment. Refer to the "Report Questions" section in the slides for what you should include in the report.reproduce.sh
The shell script file for running your ASR model. The produced answer should be the same as the best submitted file on kaggle public leaderboard. If the score is not matched, you would lost the 5 points on kaggle.- other code or models you used.
We will run your code in the following manner:
bash ./reproduce.sh $1 $2
where $1
is the audio dataset directory (e.g. data/DLHLP
), and $2
is name of the output prediction csv file (e.g. ans.csv
).
If your model is larger than GitHub’s maximum capacity (100MB), you can upload your model to Dropbox (see this tutorial). Your shell script files should be able to download the model automatically.
In requirements.txt
is a list of packages you are allowed to import in this assignment.
Note that using packages with different versions will very likely lead to compatibility issues when we reproduce your results. If you use your own computer, make sure that you install the correct version as listed in requirements.txt
. E-mail or ask the TAs first if you want to import other packages.
- Comment your question in this post
- Contact TAs by e-mail (dlhlp.ta@gmail.com)