Pinned Repositories
30dayMakeOS
《30天自制操作系统》源码中文版。自己制作一个操作系统(OSASK)的过程
500lines
500 Lines or Less
acapellabot
Acapella Extraction with a ConvNet
AI_Composer
AI Composer for Machine Learning for Hackers #2
Alibaba-MIT-Speech
Alibaba speech technology
caffe_ocr
主流ocr算法研究实验性的项目,目前实现了CNN+BLSTM+CTC架构
CAT
A CRF-based ASR Toolkit
Lightweight-Transducer
Official implementation of the Interspeech 2024 paper "Lightweight Transducer Based on Frame Level Criterion".
VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
vadnet
Real-time Voice Activity Detection in Noisy Eniviroments using Deep Neural Networks
wangmengzhi's Repositories
wangmengzhi/Lightweight-Transducer
Official implementation of the Interspeech 2024 paper "Lightweight Transducer Based on Frame Level Criterion".
wangmengzhi/CAT
A CRF-based ASR Toolkit
wangmengzhi/VAD
Voice activity detection (VAD) toolkit including DNN, bDNN, LSTM and ACAM based VAD. We also provide our directly recorded dataset.
wangmengzhi/vadnet
Real-time Voice Activity Detection in Noisy Eniviroments using Deep Neural Networks
wangmengzhi/30dayMakeOS
《30天自制操作系统》源码中文版。自己制作一个操作系统(OSASK)的过程
wangmengzhi/AlphaZero_Gomoku
An implementation of the AlphaZero algorithm for Gomoku (also called Gobang or Five in a Row)
wangmengzhi/ConvBert
wangmengzhi/DaCiDian
DaCiDian is an open-sourced chinese mandarin lexicon for automatic speech recognition(ASR)
wangmengzhi/deep-learning
my own deep learing project
wangmengzhi/denoising_DIHARD18
wangmengzhi/DWMBlurGlass
Add custom effect to global system title bar, support win10 and win11.
wangmengzhi/eesen
The official repository of the Eesen project
wangmengzhi/espnet
End-to-End Speech Processing Toolkit
wangmengzhi/models-1
Model configurations
wangmengzhi/my_ch_speech_recognition
使用python进行语音识别,基于深度学习的中文语音识别系统
wangmengzhi/py_speech_seg
A toolkit to implement segmentation on speech based on BIC
wangmengzhi/pyannote-audio
Neural building blocks for speaker diarization: speech activity detection, speaker change detection, speaker embedding
wangmengzhi/pyAudioAnalysis
Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications
wangmengzhi/Qwen-VL
The official repo of Qwen-VL (通义千问-VL) chat & pretrained large vision language model proposed by Alibaba Cloud.
wangmengzhi/serving
A flexible, high-performance serving system for machine learning models
wangmengzhi/SpectralCluster
Python re-implementation of the spectral clustering algorithm in the paper "Speaker Diarization with LSTM"
wangmengzhi/tacotron
A TensorFlow implementation of Google's Tacotron speech synthesis with pre-trained model (unofficial)
wangmengzhi/tensorpack
A Neural Net Training Interface on TensorFlow, with focus on speed + flexibility
wangmengzhi/torchscale
Foundation Architecture for (M)LLMs
wangmengzhi/TranslucentFlyouts
Translucent effect for most of the win32 flyouts
wangmengzhi/uis-rnn
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.
wangmengzhi/voicefilter
Unofficial PyTorch implementation of Google AI's VoiceFilter system
wangmengzhi/warp-ctc
Fast parallel CTC.
wangmengzhi/warp-transducer
A fast parallel implementation of RNN Transducer.
wangmengzhi/zhihu
This repo contains the source code in my personal column (https://zhuanlan.zhihu.com/zhaoyeyu), implemented using Python 3.6. Including Natural Language Processing and Computer Vision projects, such as text generation, machine translation, deep convolution GAN and other actual combat code.