Speech Recogniton Server for langauge disorders

Introduction

This is an end-to-end speech recognition server using deepspeech2 model trained by Hallym AI Data of Disorder from AI-Hub Korea

(dataset source: https://aihub.or.kr/aihubdata/data/view.do?currMenu=115&topMenu=100&aihubDataSe=realm&dataSetSn=608)

This project used Kospeech to train deepspeech2 model and to inference audio data

Requirements

python version == 3.8

Numpy: pip install numpy
Pytorch: Refer to PyTorch website to install the version w.r.t. your environment.
Pandas: pip install pandas
Matplotlib: pip install matplotlib
librosa: conda install -c conda-forge librosa
pydub pip install pydub
torchaudio: pip install torchaudio==0.6.0
tqdm: pip install tqdm
sentencepiece: pip install sentencepiece
warp-rnnt: pip install warp_rnnt
hydra: pip install hydra-core==1.1
Flask: pip install flask
ffmpeg: Refer to ffmpeg website

add model.pt and vocab dictionary csv file in root directory

Example of POST Method on Expo/React-Native

const uploadAudio = async () => {
  const filename = uri.split('/').pop()
  let formData = new FormData()
  formData.append('file', {
    uri : uri,
    name : filename,
    type : 'audio/m4a'
  })
  let options = {
    method: "POST",
    body: formData,
    headers: {
      Accpet: "application/json",
      "Content-Type": "multipart/form-data"
    }
  }
  await fetch('url/upload',options)

References

Kospeech: https://github.com/sooftware/kospeech

Deepspeech2: https://arxiv.org/abs/1512.02595

torchtorchkimtorch/Marlang-ML-Server

Speech Recogniton Server for langauge disorders

Introduction

Requirements

Example of POST Method on Expo/React-Native

References