This is an end-to-end speech recognition server using deepspeech2 model trained by Hallym AI Data of Disorder from AI-Hub Korea
(dataset source: https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=608)
This project used Kospeech to train deepspeech2 model and to inference audio data
python version == 3.8
- Numpy:
pip install numpy
- Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
- Pandas:
pip install pandas
- Matplotlib:
pip install matplotlib
- librosa:
conda install -c conda-forge librosa
- pydub
pip install pydub
- torchaudio:
pip install torchaudio==0.6.0
- tqdm:
pip install tqdm
- sentencepiece:
pip install sentencepiece
- warp-rnnt:
pip install warp_rnnt
- hydra:
pip install hydra-core==1.1
- Flask:
pip install flask
- ffmpeg: Refer to ffmpeg website
add model.pt and vocab dictionary csv file in root directory
const uploadAudio = async () => {
const filename = uri.split('/').pop()
let formData = new FormData()
formData.append('file', {
uri : uri,
name : filename,
type : 'audio/m4a'
})
let options = {
method: "POST",
body: formData,
headers: {
Accpet: "application/json",
"Content-Type": "multipart/form-data"
}
}
await fetch('url/upload',options)
Kospeech: https://github.com/sooftware/kospeech
Deepspeech2: https://arxiv.org/abs/1512.02595