Pinned Repositories
PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
adaptive-knn-mt
algorithm-base
专门为刚开始刷题的同学准备的算法基地,没有最细只有更细,立志用动画将晦涩难懂的算法说的通俗易懂!
AnySubtitle
Make your videos accessible to a wider audience by adding subtitles in your target language, with support for any language vedio. (For example, add Chinese subtitle of English vedio)
blind_watermark
Blind&Invisible Watermark (图片盲水印,提取水印无须原图!)
encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
JaxSpeechX
Fast and Effortless Speech Recognition Deployment with JAX
spleeter
Deezer source separation library including pretrained models.
Unconstrained-AVSR
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Zth9730's Repositories
Zth9730/AnySubtitle
Make your videos accessible to a wider audience by adding subtitles in your target language, with support for any language vedio. (For example, add Chinese subtitle of English vedio)
Zth9730/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Zth9730/Unconstrained-AVSR
Zth9730/asteroid
The PyTorch-based audio source separation toolkit for researchers
Zth9730/JaxSpeechX
Fast and Effortless Speech Recognition Deployment with JAX
Zth9730/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Zth9730/awesome-source-free-test-time-adaptation
A curated list of papers in Test-time Adaptation, Test-time Training and Source-free Domain Adaptation
Zth9730/awesome-totally-open-chatgpt
A list of totally open alternatives to ChatGPT
Zth9730/bark
🔊 Text-prompted Generative Audio Model
Zth9730/blsp
BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing
Zth9730/chirp
Zth9730/fairseq2
FAIR Sequence Modeling Toolkit
Zth9730/FastASR
这是一个用C++实现ASR推理的项目,它依赖很少,安装也很简单,推理速度很快,在树莓派4B等ARM平台也可以流畅的运行。 支持的模型是由Google的Transformer模型中优化而来,数据集是开源wenetspeech(10000+小时)或阿里私有数据集(60000+小时), 所以识别效果也很好,可以媲美许多商用的ASR软件。
Zth9730/faster-whisper
Faster Whisper transcription with CTranslate2
Zth9730/icefall
Zth9730/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Zth9730/MaTe3D
MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
Zth9730/MS-SNSD
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Zth9730/MyArxiv
Zth9730/NeMo-text-processing
NeMo text processing for ASR and TTS
Zth9730/PaddleSpeech
Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Zth9730/Pengi
An Audio Language model for Audio Tasks
Zth9730/PromptingWhisper
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
Zth9730/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Zth9730/RetNet
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
Zth9730/s3prl
Audio Foundation Models (Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit)
Zth9730/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Zth9730/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Zth9730/Whisper-Finetune
微调Whisper语音识别模型和加速推理,支持Web部署和Android部署
Zth9730/Zth9730.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes