AILabs ASR Python software development kit

Development Environment

Python 3.9

# install portaudio first if you develop on MAC OS X
brew install portaudio

pip install --global-option='build_ext' --global-option='-I/usr/local/include' --global-option='-L/usr/local/lib' -r requirements_dev.txt

# please check PyAudio site: https://people.csail.mit.edu/hubert/pyaudio/
# if you encouter some issues while installing PyAudio

Installation

pip install ailabs-asr

Samples

# init the streaming client
asr_client = StreamingClient('api-key-applied-from-devconsole')

# start streaming with wav file
asr_client.start_streaming_wav(
  pipeline='asr-zh-en-std',
  file='voice.wav'
  verbose=False, # enable verbose to show detailed recognition result
  on_processing_sentence=on_processing_sentence,
  on_final_sentence=on_final_sentence)

# without file to start streaming with the computer's microphone
asr_client.start_streaming_wav(
  pipeline='asr-zh-en-std',
  on_processing_sentence=on_processing_sentence,
  on_final_sentence=on_final_sentence)

💡 start_streaming_wav() method allow users to provide callback function to handle the recognition result see the result format below

💡 lookup the available pipelines in the next section

💡 see more samples in the sample respository

Support Language(`pipeline`)

pipeline	Info	language
asr-zh-en-std	Use it when speakers speak Chinese more than English	Mandarin and English
asr-zh-tw-std	Use it when speakers speak Chinese and Taiwanese.	Mandarin and Taiwanese
asr-en-std	English	English
asr-jp-std	Japanese	Japanese

Message Format

There are 2 kinds of recognized result:

The Processing Sentence(Segment)

{
  "asr_sentence": "範例句子"
}

The Final Sentence(Complete Sentence)

{
  "asr_final": true,
  "asr_begin_time": 9.314,
  "asr_end_time": 11.314,
  "asr_sentence": "完整的範例句子",
  "asr_confidence": 0.5263263653207881,
  "asr_word_time_stamp": [
    {
      "word": "完整的",
      "begin_time": 9.74021875,
      "end_time": 10.100875
    },
    {
      "word": "範例句子",
      "begin_time": 10.100875,
      "end_time": 10.1664375
    }
  ],
  "text_segmented": "完整的 範例句子"
}

Limitation

Audio Data

⚠️ Send audio data with binary frame with following spec:

Audio data format
- 16kHz, mono
- 16 bits per sample
- PCM
Sample rate per secs: 16K(16000)
Sample sizes per sec: 16000(samples) x 1(sec) x 16/8(2 bytes) = 32000 bytes ~= 32 KB(/sec)
Each chunk size: 2000 bytes, 1/16 secs

TaiwanAILabs-Yating/asr-sdk-python