Zth9730

University of Science and Technology Beijing

Computer of Science and Technology Beijing

Pinned Repositories

PaddleSpeech
Easy-to-use Speech Toolkit including Self-Supervised Learning model, SOTA/Streaming ASR with punctuation, Streaming TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language:Python11.4k 185 1.9k1.9k
adaptive-knn-mt
Language:Python1 0 00
algorithm-base
专门为刚开始刷题的同学准备的算法基地，没有最细只有更细，立志用动画将晦涩难懂的算法说的通俗易懂！
0 1 00
AnySubtitle
Make your videos accessible to a wider audience by adding subtitles in your target language, with support for any language vedio. (For example, add Chinese subtitle of English vedio)
Language:Python1 1 00
blind_watermark
Blind&Invisible Watermark （图片盲水印，提取水印无须原图！）
Language:Python1 0 00
encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Language:Python1 0 00
JaxSpeechX
Fast and Effortless Speech Recognition Deployment with JAX
0 0 00
spleeter
Deezer source separation library including pretrained models.
Language:Python1 0 00
Unconstrained-AVSR
1 1 00
wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:Python0 0 01

Zth9730's Repositories

Zth9730/AnySubtitle
Make your videos accessible to a wider audience by adding subtitles in your target language, with support for any language vedio. (For example, add Chinese subtitle of English vedio)
Language:Python1 1 00
Zth9730/encodec
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
Language:Python1 0 00
Zth9730/Unconstrained-AVSR
1 1 00
Zth9730/asteroid
The PyTorch-based audio source separation toolkit for researchers
Language:Python0 0 00
Zth9730/JaxSpeechX
Fast and Effortless Speech Recognition Deployment with JAX
0 0 00
Zth9730/wenet
Production First and Production Ready End-to-End Speech Recognition Toolkit
Language:Python0 0 01
Zth9730/awesome-source-free-test-time-adaptation
A curated list of papers in Test-time Adaptation, Test-time Training and Source-free Domain Adaptation
0 0
Zth9730/awesome-totally-open-chatgpt
A list of totally open alternatives to ChatGPT
0 0
Zth9730/bark
🔊 Text-prompted Generative Audio Model
Language:Python0 0
Zth9730/blsp
BLSP: Bootstrapping Langauge-Speech Pre-training via Behavior Alignment of Continuation Writing
Language:Python0 0
Zth9730/chirp
Language:Python0 0
Zth9730/fairseq2
FAIR Sequence Modeling Toolkit
Language:Python0 0
Zth9730/FastASR
这是一个用C++实现ASR推理的项目，它依赖很少，安装也很简单，推理速度很快，在树莓派4B等ARM平台也可以流畅的运行。支持的模型是由Google的Transformer模型中优化而来，数据集是开源wenetspeech(10000+小时)或阿里私有数据集(60000+小时)，所以识别效果也很好，可以媲美许多商用的ASR软件。
Language:C0 0
Zth9730/faster-whisper
Faster Whisper transcription with CTranslate2
Language:Python0 0
Zth9730/icefall
Language:Python0 0
Zth9730/Macaw-LLM
Macaw-LLM: Multi-Modal Language Modeling with Image, Video, Audio, and Text Integration
Language:Python0 0
Zth9730/MaTe3D
MaTe3D: Mask-guided Text-based 3D-aware Portrait Editing
0 0
Zth9730/MS-SNSD
The Microsoft Scalable Noisy Speech Dataset (MS-SNSD) is a noisy speech dataset that can scale to arbitrary sizes depending on the number of speakers, noise types, and Speech to Noise Ratio (SNR) levels desired.
Language:Python0 0
Zth9730/MyArxiv
Language:CSS0 0
Zth9730/NeMo-text-processing
NeMo text processing for ASR and TTS
Zth9730/PaddleSpeech
Easy-to-use Speech Toolkit including SOTA/Streaming ASR with punctuation, influential TTS with text frontend, Speaker Verification System, End-to-End Speech Translation and Keyword Spotting. Won NAACL2022 Best Demo Award.
Language:Python0 0
Zth9730/Pengi
An Audio Language model for Audio Tasks
0 0
Zth9730/PromptingWhisper
Promting Whisper for Audio-Visual Speech Recognition, Code-Switched Speech Recognition, and Zero-Shot Speech Translation
Language:Python0 0
Zth9730/RepCodec
Models and code for RepCodec: A Speech Representation Codec for Speech Tokenization
Language:Python0 0
Zth9730/RetNet
An implementation of "Retentive Network: A Successor to Transformer for Large Language Models"
Language:Python0 0
Zth9730/s3prl
Audio Foundation Models (Self-Supervised Speech/Sound Pre-training and Representation Learning Toolkit)
Language:Python0 0
Zth9730/SpeechTokenizer
This is the code for the SpeechTokenizer presented in the SpeechTokenizer: Unified Speech Tokenizer for Speech Language Models. Samples are presented on
Language:Python0 0
Zth9730/unilm
Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities
Zth9730/Whisper-Finetune
微调Whisper语音识别模型和加速推理，支持Web部署和Android部署
Language:C0 0
Zth9730/Zth9730.github.io
Github Pages template for academic personal websites, forked from mmistakes/minimal-mistakes
Language:JavaScript0 0