wmt_ai_study

My AI study

弃身锋刃端,性命安可怀?父母且不顾,何言子与妻!  
名编壮士籍,不得中顾私。捐躯赴国难,视死忽如归!  
——曹植《白马篇》  

Recorder code work

see https://github.com/weimingtom/wmt_recorder_study

Wireless code work

see https://github.com/weimingtom/wmt_iot_study

TFLite work

TFLite micro esp32, MSM261S4030H0R

(TODO) TFLite micro ARM, NXP MIMXRT1050-EVKB or Arm虚拟硬件(AVH)

(TODO) ArduCAM/pico-tflmicro, ArduCam Pico4ML

TinyML: 基于TensorFlow Lite在Arduino和超低功耗微控制器上部署机器学习

ML-KWS work

  • mlkws_stm32f411re_v8_first_success.rar
    (STM32) with STM32CubeMX, Keil MDK5 AC6 and NUCLEO-F411RE
    (stm32cubemx) Heap_Size==0x12000, Stack_Size==0x6000
    (DONE, SUCCESS, same result as mbed-cli) port to Keil MDK5 AC6 project
    (Dependency, using CMSIS version) search baidupan, CMSIS_5-5.4.0.zip
    https://github.com/ARM-software/CMSIS_5/releases/tag/5.4.0
    small change in CMSIS_5\CMSIS\Core\Include\core_cm4.h
//__FPU_PRESENT=1,
//#warning "Compiler generates FPU instructions for a device without an FPU (check __FPU_PRESENT)"
//#define __FPU_USED       0U
#if !defined(__FPU_PRESENT)
#define __FPU_PRESENT 1
#endif
#define __FPU_USED       1U

《嵌入式系统案例教程》第3章FPU,硬浮点和__FPU_PRESENT和__FPU_USED,在system_stm32f4xx.c,
需要全局定义__FPU_PRESENT和__FPU_USED,
见void SystemInit(void)函数。在stm32f429xx.h中有__FPU_PRESENT=1的宏定义,
或者在Keil 5的Floating Point Hardware中选择Use Single Precision
会产生__FPU_USED=1的宏定义
stm32f429属于Cortex-M4F架构,所以有FPU

Total time : 164061 us  
Detected right (99%)  
Total time : 164060 us  
Detected right (99%)  

Speech-Recognition Python work

  • https://github.com/weimingtom/Speech-Recognition_mod
    work
  • Speech-Recognition-master_v3_success.tar.gz
    (Python3, TF2) With Tensorflow 2.x and xubuntu-20.04-desktop-amd64
    Keyword speech recognition, Speech commands, CNN, 10 words
    (install tf2) pip3 install tensorflow-cpu (if in xubuntu, use pip3 and python3)
    (origin) https://github.com/iamlekh/Speech-Recognition
  • Speech-Recognition_v7.tar.gz
    (Python3, TF2) with AI Studio, not done
    (install tf2) pip install tensorflow-cpu (if in AIStudio, use pip and python)

(TODO, porting difficult) voice_control_led, Maixduino MFCC-DTW + VAD (stm32-speech-recognition like)

  • voice_control_led_v9.rar
    (NOT DONE, Many Problems) TODO, with VS2013, windows port, calculate DTW distance get zero, don't know reason
    (TODO, need DOC) about fft, see yinxiangbiji
  • voice_control_led_en_v2_success.rar
    for Maixduino, with Arduino IDE and Sipeed Maix Dock (K210)
    (origin) Maix_Speech_Recognition
    https://github.com/sipeed/Maixduino/blob/master/libraries/Maix_Speech_Recognition
    examples/voice_control_led_en/voice_control_led_en.ino
    examples/get_voice_model/get_voice_model.ino

(TODO) stm32-speech-recognition, STM32 MFCC-DTW + VAD

(TODO, NOT IMP) cortex-m-kws, aid_speech, tengine-lite

NOTE: This demo may be not good,
because of lack of training method and model data,only providing model convert tool
Details see:
https://github.com/OAID/cortex-m-kws/blob/master/Documentation/Tengine%20on%20STM32F7%20Setup%20Manual.pdf
Tengine provides three versions:
a) https://github.com/OAID/Tengine version (only arm)
b) http://www.eaidk.com version (EAIDK-310, EAIDK-610)
c) http://www.tengine.org.cn version (rpi3b, rk3399, hi3516, etc)

(TODO) KWS_MCU, TC-ResNet8

https://github.com/Alex-Riviello/KWS_MCU

(TODO???) UT2UH/ML-KWS-for-ESP32

https://github.com/UT2UH/ML-KWS-for-ESP32
xtensa_copy_q7 ???????where????????
https://github.com/UT2UH/ML-KWS-for-ESP32/tree/master/Source/XMSIS/NN

(TODO) VAD

(TODO) Python GMM Chapter07 speech_recognizer.py

(TODO) html5, MFCC-DTW

(TODO) micro_speech (TFLM 2.0) train

(TODO) TensorFlow入门教程(17)语音识别

search baidupan, 17.speech_recognition.zip
TensorFlow入门教程(17)语音识别(上)
https://blog.csdn.net/rookie_wei/article/details/84527839

(TODO) diy-alexa, voice-controlled-robot, ESP32

https://github.com/atomic14/diy-alexa
Voice-Controlled Robot With the ESP32
https://github.com/atomic14/voice-controlled-robot

(TODO) MACE Micro Examples

(TODO) hyperconnect/TC-ResNet

https://github.com/hyperconnect/TC-ResNet
https://github.com/hyperconnect/TC-ResNet/blob/master/tflite_tools/run_benchmark.sh
https://github.com/tranHieuDev23/TC-ResNet
https://github.com/jianvora/Continual_Learning_for_KWS
https://github.com/olimjon-ibragimov/TC-ResNet-TensorFlow-2

(TODO) pytorch speech command tutorial

https://github.com/pytorch/tutorials/blob/master/intermediate_source/speech_command_recognition_with_torchaudio.py
https://pytorch.org/tutorials/intermediate/speech_command_recognition_with_torchaudio.html
(TODO) kws_game
https://github.com/chrisworld/kws_game

(TODO) tflite python

TensorFlow Lite Tutorial Part 3: Speech Recognition on Raspberry Pi
https://www.digikey.com/en/maker/projects/tensorflow-lite-tutorial-part-3-speech-recognition-on-raspberry-pi/8a2dc7d8a9a947b4a953d37d3b271c71
TensorFlow Lite Speech Recognition Demo
https://github.com/ShawnHymel/tflite-speech-recognition
【TensorFlow】Raspberry Piへのインストール方法3選
https://www.sejuku.net/blog/47178
TensorFlow Lite Python
https://tensorflow.google.cn/lite/guide/python
How to Run TensorFlow Lite Models on Raspberry Pi
https://blog.paperspace.com/tensorflow-lite-raspberry-pi/

(TODO) pico-wake-word

https://github.com/henriwoodcock/pico-wake-word
https://www.adafruit.com/product/1063
MAX4466 (like MAX9814???)
search baidupan, pico-wake-word-main.zip
tensorflow 2.4.1, 2021-03-31

(TODO) pytorch CRNN (???)

https://github.com/isadrtdinov/kws-attention

(TODO???) Edge Impulse, Recognize sounds from audio

https://docs.edgeimpulse.com/docs/audio-classification
http://www.elecfans.com/d/1532483.html
seeedstudio wio terminal tinyml
https://wiki.seeedstudio.com/Wio-Terminal-TinyML-EI-1/
https://wiki.seeedstudio.com/Wio-Terminal-TinyML-EI-3/

pico-microphone, Raspberry Pi Pico

while (1) {
	// store and clear the samples read from the callback
	int sample_count = samples_read;
	samples_read = 0;

	// loop through any new collected samples
	int maxValue = 0;
	for (int i = 0; i < sample_count; i++) { //or min(sample_count, 5)
		int value = sample_buffer[i];
		if (fabs(value) > fabs(maxValue)) {
			maxValue = value;
		}
	}
	//if M5stack PDM  
	printf("%d,%d,%d\n", maxValue / 128, -256, 256);
	//if max9814
	//printf("%d,%d,%d\n", maxValue, -512, 512);
	sleep_ms(50);
}
  • hello_pdm_microphone, using MP34DT01 PDM MEMS Microphone<->Rpi Pico, why this ok? see below:
    (self) 3V<-> (self) SEL
    GND<-> GND
  • hello_analog_microphone, use MAX9814, for 3.3V and 5V
  • hello_analog_microphone, use MAX9813H, for 5V only
  • hello_analog_microphone, use MAX9812L, for 3.3V only, not good, go stable slowly

(IMP, TODO) Edge Impulse, tensorflow 2.3

  • linux, xubuntu200464
    example-standalone-inferencing-master_v1.tar.gz
  • windows, vs2013
    helloei_v3_success.rar

画了个V3S,慢更语音助手(LUCKY)

https://whycan.com/t_7000.html
记录自制linux的过程(基于全志V3s)
https://blog.csdn.net/qq_46604211/article/details/116024970
https://github.com/dakun-create/voice-assistant-byV3S

(???) NVIDIA NeMo

https://docs.nvidia.com/deeplearning/nemo/user-guide/docs/en/stable/starthere/tutorials.html

(TODO) Speech commands recognition with PyTorch | Kaggle 10th place solution in TensorFlow Speech Recognition Challenge

https://github.com/tugstugi/pytorch-speech-commands

(TODO) [深度学习进阶 - 实操笔记] 语音识别基础

https://blog.csdn.net/weixin_41809530/article/details/106585116
[深度学习进阶 - 实操笔记] 语音识别SPEECH_COMMANDS数据集
https://www.freesion.com/article/5523850245/
https://blog.csdn.net/weixin_41809530/article/details/106585116
https://blog.csdn.net/weixin_41809530/article/details/106669728

(TODO) Offline Speech Recognition on Raspberry Pi 4 with Respeaker

https://github.com/AIWintermuteAI/DeepSpeech_RaspberryPi4_Hotword
https://www.hackster.io/dmitrywat/offline-speech-recognition-on-raspberry-pi-4-with-respeaker-c537e7

(IMP) esp32 dsp (with asm)

https://github.com/espressif/esp-dsp

(???) Python, sipeed MaixPy, maix-asr, (speech_recognizer)

https://blog.csdn.net/xuguoliang757/article/details/118462079
https://github.com/sipeed/MaixPy_scripts/blob/master/multimedia/speech_recognizer/maix_asr_2900k_0x500000.kmodel
https://bbs.sipeed.com/thread/988

(IMP, TODO) TensorFlow语音识别实战

  • search book_TensorFlow语音识别实战
  • search baidupan, TensorFlow语音识别实战-源代码.7z
  • 第十章——基于MFCC和CTC的语音汉字转换
  • 第一章——语音识别之路, 1.7 实战——基于特征词的语音唤醒
  • 这本书的原创代码较多,可以研究
  • tensorflow-gpu==2.1.0

(TODO) 以前用树莓派3b运行deepspeech和vosk的备份

  • 除了Sphinx外,在树莓派上可以运行的较完整的离线语音识别引擎有deepspeech和vosk
    search work_deepspeech_vosk_raspberry_pi_upload
    search deepspeech_readme.txt

(TODO) Openai-whisper (for pyTorch) and whisper.cpp (for C++)

  • (可能)新的选择:openai-whisper
    我试过可以在python 3.7下运行(openai-whisper最旧版本),需要ffmpeg,
    但仅限于PC,没有在树莓派上测试,模型文件较小,但识别英文单词时间可能需要30秒以上
  • 较快的whisper推理版:whisper.cpp
    https://github.com/ggerganov/whisper.cpp
    比较容易编译(在ubuntu和aistudio),可以tiny-en model,可以识别句子,但识别单词似乎有问题(不知道为啥),待考
  • whisper and whisper.cpp command line
有没有人试一下树莓派4b上跑whisper.cpp的速度如何?我试过(以前)在aistudio上用openai-whisper(20230117)
和whisper.cpp上运行语句识别,前者是10秒左右,后者是3秒左右。后者之所以快还因为模型文件格式变了(使用ggml)。
调用方式如下(我可能需要记录一下):
whisper 2830-3980-0043.wav --language en --model tiny.en
./main -m models/ggml-tiny.bin -f samples/audio/2830-3980-0043.wav
  • whisper.cpp on rpi4
我用树莓派4b 4gb版运行whisper.cpp成功(tiny-en模型,任务管理器里面那个main进程就是),
大概占用CPU全部100%,
占用内存150MB(增加到450MB,桌面环境下),耗时大概10秒(我之前用aistudio测试是3秒)
相当于whisper pytorch版的速度。至于树莓派4b运行whisper pytorch的速度如何,暂时还没测试。

尝试用树莓派4b 4gb运行whisper pytorch版,结果失败了,装倒是可以很容易装上去
(指定-i参数镜像加速),但运行会有错误,不会搞,等以后研究pytorch再说吧,
对我而言暂时没有多大的需要,除非我能看懂pytorch和懂得如何在树莓派上调试 ​​​
  • whisper python, audio-0.6.0.tar.gz, Makefile
	pip install openai-whisper==20230117
	whisper yes.2a6d6pep.wav --language en --model tiny.en
	(cd audio; whisper 2830-3980-0043.wav --language en --model tiny.en)
	(cd audio; whisper 4507-16021-0012.wav --language en --model tiny.en)
	(cd audio; whisper 8455-210777-0068.wav --language en --model tiny.en)
  • whisper.cpp, build.sh
./main -m models/ggml-tiny.bin -f samples/jfk.wav 
./main -m models/ggml-tiny.bin -f samples/audio/2830-3980-0043.wav
./main -m models/ggml-tiny.bin -f samples/audio/4507-16021-0012.wav
./main -m models/ggml-tiny.bin -f samples/audio/8455-210777-0068.wav
rm -rf samples/output.wav
ffmpeg -i samples/yes.2a6d6pep.wav -ar 16000 -ac 1 -c:a pcm_s16le samples/output.wav
./main -m models/ggml-tiny.bin -f samples/output.wav

micro_speech

Facebook flashlight and wav2letter, in C++

Swift机器学习:面向iOS的人工智能实战

TensorFlow Model

PaddleSpeech

(TODO?) TinyMaix, KWS example

(TODO, IMP) Demonstrator project for speech recognition featuring several technologies, with tensorflow.js

(TODO) 基于DNN和DTW算法配合VAD截取的微语音识别框架, some sources lost, keil

(TODO) LSTM + CTC (?), TensorFlow 2深度学习实战, 人民邮电出版社

我试过用aistudio跑《TensorFlow 2深度学习实战里面》那个LSTM例子代码,
的确是可以跑得通的(我用的是TensorFlow 2.3)。
不过我以前还试过另一个LSTM的语音识别,结果失败了,是这个:
《Python+TensorFlow机器学习实战》第9.2章
(可能这里已经改好了:llSourcell/tensorflow_speech_recognition_demo)。
有时间可以对照改一下。

(TODO) LSTM + CTC (???), Python自然语言处理实战

TFLearn LSTM, tensorflow_speech_recognition_demo, and 《Python+TensorFlow机器学习实战》第9.2章

(TODO, IMP, not exists now) ESP8266 Baidu ASR Demo (AT command)

app/include/rest.h, lib/librest.a, vop.baidu.com/pro_api    
int send_baidu(int sta_socket);

基于百度语音的识别语音,转文字

ESP32 语音唤醒+离线识别+百度在线识别

(TODO) DTW, 用“芯”探核:龙芯派开发实战

(TODO) TinyML Cookbook

Deep Learning for NLP and Speech Recognition

Voiceprint-Recognition

sherpa-ncnn

ThatProject, ESP32_MICROPHONE

(TODO) Keras 2D CNN+RNN+CTC

KWS-SoC——基于Wujian100的音频流关键词检测SoC拓展开发笔记之二

IoT for Beginners - A Curriculum, Recognize speech with an IoT device

PyTorch2.0深度学习从零开始学, pytorch

  • 第14章, 创建你自己的小精灵—基于MFCC的语音唤醒实战
  • PyTorch2.0深度学习从零开始学-源码.rar

(TODO) NLP

  • 中文语料库, 中文语料
  • 知识图谱, 知网, csdn, wordnet
  • search baidupan, 语料,Corpus

(TODO) Whisper.cpp for Android

(TODO???) gd32f450-run-tflite-micro-speech-demo

(TODO) SYSTRAN/faster-whisper

whisper (python version, pytorch version, openai-whisper) installed by pip of raspberry pi 4b

  • 树莓派4b安装whisper(don't sudo pip install)
    Newest version of openai-whisper is ok, no need to install old version 20230117.
    After installing, need to pip uninstall torch and reinstall old version torch==1.13.1, see below.
pip install -i https://pypi.tuna.tsinghua.edu.cn/simple openai-whisper==20230117  
pip install typing-extensions==4.3.0
/home/pi/.local/bin/whisper --help
pi@raspberrypi:~/whisper.cpp_aistudio/audio $ /home/pi/.local/bin/whisper 4507-16021-0012.wav --language en --model tiny.en

cp tiny.en.pt ./.cache/whisper/tiny.en.pt

gedit /home/pi/.local/lib/python3.9/site-packages/whisper/decoding.py
  • running failed, to solve this problem, see below, pip uninstall torch==2.0.0
vi /opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/whisper/decoding.py
line 468
        print(tokenizer.sot, self.initial_tokens, "test")
        if self.initial_tokens.count(tokenizer.sot) > 0:
        self.sot_index: int = self.initial_tokens.index(tokenizer.sot)
        else:
            self.initial_tokens = (tokenizer.sot,)
            self.sot_index: int = 1
pip list | grep torch
pip uninstall torch==2.0.0
pip install torch==1.13.1
  • 转,《【RaspberryPi】Whisperで音声認識させてみた》
    我测试过这篇文章的方法,可以成功用树莓派4b的6位os运行起whisper pytorch版。方法是重新安装pytorch降级到pytorch 1,如果用pytorch 2会报错:
pip uninstall torch==2.1.0  
pip install torch==1.13.1  

测试方法是:whisper 2830-3980-0043.wav --language en --model tiny.en
耗时大概是13秒

fquirin/speech-recognition-experiments, Whisper TFlite

  • https://github.com/fquirin/speech-recognition-experiments
  • 各种新兴语音识别引擎在树莓派上的性能准确率对比
  • 我记错了,实际上应该是aistudio的whisper.cpp略快于aarch安卓版whisper.cpp,aarch安卓版whisper.cpp快于树莓派4b版的whisper.cpp,而树莓派4b版的whisper.cpp快于armeabi-v7a安卓版的whisper.cpp,分别是3秒<4秒<10秒<12秒 。另外机器的性能越好,速度越快,例如平板的安卓会比手机的安卓跑whisper.cpp的速度更快
  • gh上有人评测了各种新兴语音识别引擎在树莓派4和香橙派5上的运行速度对比(RTF是实时率,没什么用),fquirin/speech-recognition-experiments,这里我注释一下,其一树莓派4上跑whisper python似乎有问题(补注:已解决,卸载torch 2重新安装1.13.1版),可能这个是作者自己编译运行的。其二我测试whisper.cpp在树莓派4b上是10秒左右,我以前记错了,我以为是3秒(3秒应该是aistudio的速度)

(TODO, TODO) Whisper TFlite (for android)

whisper.cpp for android (yeyupiaoling/Whisper-Finetune), fastest speed on Android is 4 seconds per one Engish sentence (64bit Android fastest is 4 seconds, 32bit Android fastest is 12 seconds)

  • mod from https://gitee.com/yeyupiaoling/Whisper-Finetune
  • or https://github.com/yeyupiaoling/Whisper-Finetune
  • whisper.cpp版研究。好了,目前最好的记录是一句英文的语音识别需要4秒(如果是32位安卓手机则为12秒),我这次用的代码是yeyupiaoling/Whisper-Finetune的安卓版代码,替换英文版的模型文件,然后在Application.mk指定APP_OPTIM := release(作者说的一定要打包成发布版,可以通过这种方式加速)。如果不指定这个,就会达到100多秒。那就是说:(1)必须指定APP_OPTIM(参考Whisper-Finetune)(2)必须使用最新版NDK(3)必须保证代码没修改过,使用指定正确的C和C++宏定义(我之前编译的版本用的宏定义和Whisper-Finetune里面的不同)
  • search baidupan, Whisper-Finetune-master_v2.apk
  • search baidupan, Whisper-Finetune-master_v2_very_good_redmi12c_4sec_redmi9a_12sec.7z
  • Whisper-Finetune
    https://gitee.com/yeyupiaoling/Whisper-Finetune/tree/master/AndroidDemo
    search whisper.apk

whisper.cpp for android arm64 (whisper.android.java in ggerganov/whisper.cpp/examples), 5 seconds (tested on Redmi 12C)

  • https://github.com/ggerganov/whisper.cpp/tree/master/examples/whisper.android.java
  • 快到5秒可以运行到,但编译成arm64,而且用最新NDK编译不是r10
  • search whisper.android.java_simple_v1_success_redmi12c_5sec.7z
  • search whisper.cpp-1.0.4_see_whisper_android-master.zip
  • search whisper.cpp_old_version.7z
  • 我比较过,那个运行正确且速度较快(在20秒内转换一个英语句子)的whisper.cpp安卓版,是来源于1.0.4版本附近,相当于最早的发布版(有很少的修改,但代码和1.0.4几乎一样。这个项目火起来是在2023年4月左右,大概是版本1.3左右),这么看来如果有耐心的话可以从1.0.4版一直测试到1.5.0版,找到合适自己的版本编译成安卓版测试速度,就可以知道whisper.cpp安卓版最快可以在多少秒内语音识别出一个英语句子了,粗略估计可以在10秒内,最快可以到达5秒,如果手机是稍微高档一点(比红米好,主频在2.0GHz以上)或者支持显示加速,估计可以达到树莓派4b的水平,可以在3秒左右识别出英语句子,当然这目前只是我臆想猜测的,我还没有实际测试过(我手头上配置最好的设备时华为平板,可以找时间测试)
  • search whisper.android.java_simple_v2_success_redmi12c_5sec_redmi9a_16sec.rar
  • <1> whisper 1.0.4 android:
    https://github.com/Digipom/WhisperCppAndroidDemo
  • <2> whisper tflite:
    https://github.com/vilassn/whisper_android
  • 安卓whisper.cpp.android.java上传网盘的编译文件名
    whisper.android.java_v3_2830wav_use_146630ms.7z
    https://developer.android.google.cn/ndk/guides/cpu-arm-neon?hl=zh-cn#ndk-build_1
    LOCAL_ARM_NEON := true
    https://developer.android.google.cn/ndk/guides/cpp-support?hl=zh-cn
    APP_STL 变量指定 c++_shared、c++_static、none 或 system。例如:
    https://www.itxm.cn/post/edafbg2b5.html

mozilla/Deepspeech test audio wav files

Format                         : Wave
Overall bit rate mode          : Constant
Overall bit rate               : 256 kb/s

Audio
Format                         : PCM
Format settings, Endianness    : Little
Format settings, Sign          : Signed
Codec ID                       : 1
Bit rate mode                  : Constant
Bit rate                       : 256 kb/s
Channel(s)                     : 1 channel
Sampling rate                  : 16.0 kHz
Bit depth                      : 16 bits

(TODO) TODO list, keep putting here at last