whaozl

爱王晓，爱程序，爱语言，开心coding，开心playing，做最好的boy。AM coding,PM reading,one one up!

Shanghai,China

Pinned Repositories

aidatatang_200zh
Aidatatang_200zh is an open source Chinese Mandarin speech corpus released by DataTang Technology Co., Ltd (www.datatang.com).
Language:Shell1 1 00
anjos2015
关于自己的一些2015年的算法
Language:Java3 1 00
cn2an
📦 快速转化「中文数字」和「阿拉伯数字」～ (最新特性：分数，日期、温度等转化）
Language:Python1 0 00
deform-conv
Deformable Convolution in TensorFlow / Keras
Language:Python1 2 00
focal-loss
Focal loss for Dense Object Detection
Language:Python1 2 00
ICTCLAS
中科院分词java客户端；支持x64、x86的windows、linux平台。
Language:Java1 1 00
MLAlgorithms
Minimal and clean examples of machine learning algorithms
Language:Python1 1 00
whisper-plus
WhisperPlus: Advancing Speech-to-Text Processing 🚀
Language:Python1 0 00

whaozl's Repositories

whaozl/whisper-plus
WhisperPlus: Advancing Speech-to-Text Processing 🚀
Language:Python1 0 00
whaozl/3D-Speaker
A Repository for Single- and Multi-modal Speaker Verification, Speaker Recognition and Speaker Diarization
Language:Python0 0
whaozl/attention-is-all-you-need-pytorch
A PyTorch implementation of the Transformer model in "Attention is All You Need".
Language:Python0 0
whaozl/bertTokenizer
java implementation of Bert Tokenizer, support output onnx tensor for onnx model inference
whaozl/CapsWriter-Offline
CapsWriter 简陋但好用的离线版，一个 PC 端的语音输入工具
Language:Python0 0
whaozl/ChatTTS
A generative speech model for daily dialogue.
Language:Python0 0
whaozl/faster-whisper
Faster Whisper transcription with CTranslate2
whaozl/GLM-4
GLM-4 series: Open Multilingual Multimodal Chat LMs | 开源多语言多模态对话模型
Language:Python0 0
whaozl/icefall
Language:Python0 0
whaozl/k2-v2.0-pre-branch-HLG
FSA/FST algorithms, differentiable, with PyTorch compatibility.
Language:Cuda0 0
whaozl/kaldi-native-fbank
Kaldi-compatible online fbank extractor without external dependencies
Language:C++0 0
whaozl/Leaderboard
SpeechIO Leaderboard: a large, robust, comprehensive, benchmarking platform for Automatic Speech Recognition.
Language:Python0 0
whaozl/LLaSM
第一个支持中英文双语语音-文本多模态对话的开源可商用对话模型。便捷的语音输入将大幅改善以文本为输入的大模型的使用体验，同时避免了基于 ASR 解决方案的繁琐流程以及可能引入的错误。
Language:Python0 0
whaozl/RealSI
RealSI: Open Benchmark for Simultaneous Interpretation in Real-world Scenarios
whaozl/Recorder
html5 js 录音 mp3 wav ogg webm amr 格式，支持pc和Android、iOS部分浏览器、Hybrid App（提供Android iOS App源码）、微信，提供ASR语音识别转文字 H5版语音通话聊天示例 DTMF编码解码
Language:JavaScript0 0
whaozl/riva-asrlib-decoder
Standalone implementation of the CUDA-accelerated WFST Decoder available in Riva
Language:Python0 0
whaozl/SD-Eval
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
whaozl/sherpa
Speech-to-text server framework with next-gen Kaldi
Language:Python0 0
whaozl/sherpa-onnx
Real-time speech recognition using next-gen Kaldi with onnxruntime without Internet connection. Support embedded systems, Android, iOS, Raspberry Pi, x86_64 servers, websocket server/client, C/C++, Python, Kotlin
Language:C++0 0
whaozl/silero-vad5
Silero VAD: pre-trained enterprise-grade Voice Activity Detector
Language:Python0 0
whaozl/speech-to-speech
whaozl/SpeechT5
Unified-Modal Speech-Text Pre-Training for Spoken Language Processing
Language:Python0 0
whaozl/TeleSpeech-ASR
Language:Python0 0
whaozl/TMSpeech
腾讯会议摸鱼工具
Language:C#0 0
whaozl/west
We Speech Transcript based on LLM, in 300 lines of code.
whaozl/Whisper-Finetune
微调Whisper语音识别模型，支持无时间戳数据训练，有时间戳数据训练、无语音数据训练。加速推理，支持Web部署、Windows桌面部署和Android部署
Language:C0 0
whaozl/whisper-jni
A JNI wrapper for using whisper.cpp, allows to transcribe speech to text in Java.
whaozl/whisper-medusa
Whisper with Medusa heads
whaozl/whisper.cpp
Port of OpenAI's Whisper model in C/C++
whaozl/whisperX
WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)