Pinned Repositories
1-billion-word-language-modeling-benchmark
Formerly known as code.google.com/p/1-billion-word-language-modeling-benchmark
3D-convolutional-speaker-recognition
:speaker: Deep Learning & 3D Convolutional Neural Networks for Speaker Verification
3D-R2N2
Single/multi view image(s) to voxel reconstruction using a recurrent neural network
3d_face_gcns
pytorch implementation of 3d_face_gcn + audioDVP paper
4D-Facial-Avatars
Dynamic Neural Radiance Fields for Monocular 4D Facial Avater Reconstruction
955.WLB
955 不加班的公司名单
996.ICU
Repo for counting stars and contributing. Press F to pay respect to glorious developers.
MultiToneNet
recorder
ToneNet
ToneNet: A CNN Model of Tone Classification of Mandarin Chinese
saber5433's Repositories
saber5433/AudioLDM2
Text-to-Audio/Music Generation
saber5433/BigCodec
Official implementation of the paper "BigCodec: Pushing the Limits of Low-Bitrate Neural Speech Codec"
saber5433/bitsandbytes
Accessible large language models via k-bit quantization for PyTorch.
saber5433/ctc-forced-aligner
Text to speech alignment using CTC forced alignment
saber5433/DeepFilterNet
Noise supression using deep filtering
saber5433/DiariZen
A toolkit for speaker diarization.
saber5433/F5-TTS
Official code for "F5-TTS: A Fairytaler that Fakes Fluent and Faithful Speech with Flow Matching"
saber5433/FCPE
saber5433/InspireMusic
InspireMusic: A fundamental toolkit for music, song and audio generation.
saber5433/LangSegment
It is a multi-lingual (97 languages) text content automatic recognition and segmentation tool. 强大的TTS多语言(97种语言)混合文本内容自动分词工具。
saber5433/LibriTTS-P
LibriTTS-P: A Corpus with Speaking Style and Speaker Identity Prompts for Text-to-Speech and Style Captioning
saber5433/lina-speech
lina-speech : linear attention based text-to-speech
saber5433/LLaMA-Factory
Efficiently Fine-Tune 100+ LLMs in WebUI (ACL 2024)
saber5433/Make-An-Audio-2
a text-conditional diffusion probabilistic model capable of generating high fidelity audio.
saber5433/mamba
Mamba SSM architecture
saber5433/MuCodec
saber5433/MuseTalk
MuseTalk: Real-Time High Quality Lip Synchorization with Latent Space Inpainting
saber5433/Music-Source-Separation-Training
Repository for training models for music source separation.
saber5433/NeMo-text-processing
NeMo text processing for ASR and TTS
saber5433/open_clip
An open source implementation of CLIP.
saber5433/python-audio-separator
Easy to use stem (e.g. instrumental/vocals) separation from CLI or as a python package, using a variety of amazing pre-trained models (primarily from UVR)
saber5433/S3Tokenizer
Reverse Engineering of Supervised Semantic Speech Tokenizer (S3Tokenizer) proposed in CosyVoice
saber5433/SimpleSpeech
The open source code for SimpleSpeech series
saber5433/speech-trident
Awesome speech/audio LLMs, representation learning, and codec models
saber5433/super-monotonic-align
saber5433/Supercodec
saber5433/text-labeler
A simple svs labeling tool
saber5433/to-jyutping
粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool
saber5433/ToJyutping
粵語拼音自動標註工具 Cantonese Pronunciation Automatic Labeling Tool
saber5433/wetts
Production First and Production Ready End-to-End Text-to-Speech Toolkit